Firstly, we’re sorry.
We’re sorry this happened. We don’t like problems any more than you do, they cause unnecessary confusion and detract from what is otherwise is a fantastic system.
It’s our aim to be part of the wider solution to the increasing demand for healthcare services, and we recognise that Florence has to be a robust system that behaves as you would expect.
I hope by way of the following explanation you can see we take these problems seriously and use the experience they provide us with to build an even more resilient system.
A summary of the issues.
The problem occurred while we conducted regular maintenance on Florence. Just as your computer at home installs regular operating system updates, you’ll know them as “Windows Updates”, the servers that operate Florence also receive updates to their Linux based operating system.
Last Tuesday we updated the software on the server that sends all the text messages. Although the update was apparently successful, shortly after it caused a significant failure in the code that sends the texts.
Normally when code fails it reports an “error” state or response from which you can easily identify and diagnose a problem.
In this instance the failure was such that the code didn’t even produce an error state, it simply stopped working. This kind of error is very rare and in such instances there is no programmatic way to pick up the error.
In actual fact the normal processes kept on working as expected; text messages were received, responses compiled, reminders were scheduled. The problem was isolated to the final script that sends the texts.
Fixing the issue.
Once we had spotted that Florence was not sending texts our first priority was to get the service back working. This took us approximately 2 hours.
Next Steps
The next steps we take are to learn from what happened and implement better monitoring of Florence so we can pick up any issues earlier. I’d like to say there is an easy and foolproof way to do this, but it’s simply not the case and I’d be lying if I said we could.
Nothing stands still, software must be upgraded to the latest versions, (usually for security reasons), but compatibility issues do and will arise.
But using this latest experience, and improving monitoring we’re confident any future any issues will be picked up and remedied much faster.
Once again, we’re sorry this happened.
Gary Bury