
This problem was obviously a result of infrastructure somewhere that could not cope with the volumes (or something else). We that work in the infrastructure understand the complexities of routing messages through many infrastructure components in order to ultimately deliver a service to consumers of an acceptable quality. However, many single points of failure exist and problems like this (very high profile) situation can occur.
This incident made me realise again how complex it is to design a Financial System of acceptable quality that must run on such an unpredictable network. The challenges to ensure reliability (and recovery) in a real-time payment system installed in an environment that is fundamentally unstable (have many single points of failure) is huge. In many discussions that I have had in the industry, very few practitioners understand the problem, and even less have designed solutions that are able to cope in these environments.
No comments:
Post a Comment