For those that don’t use Twitter, or those that missed it – Twitter had a critical (I say critical, because many millions of users could not log in, and there was no workaround) issue which caused a large portion of its users to be logged out of the service this past Sunday (12/28). I won’t go into the details of the bug, but Twitter user @_Ninji has found the most likely root cause here. It seems that the server incorrectly believed that the server date was 12/29/2015, instead of 12/29/2014.
Granted, bugs are a necessary evil with software testing – as testers, it is our job to provide information about the overall quality of the software and identified risks of going live. We cannot identify all bugs that may be present in the software – mainly because we all have a different opinion of what truly is a “bug”. Additionally, the complexity of software like Twitter with millions of users would prevent us from testing every possible outcome that could arise from using the service.
However, this seems like a big one – it wasn’t a corner case specific to a certain legacy version of the app, OS, handset, etc. – it impacted a majority of users of the service. I have no insight into how Twitter tests and develops their software – I have searched high and low on the topic for the past 2 days, and would be very grateful for any insight. That being said, this seems like an issue that could and should have been caught before build promotion with some form of human testing – whether that be exploratory testing or UAT. My initial reaction is that this is an (unfortunate) outcome of our rapid adoption of processes like Continuous Integration and DevOps – processes in which human testing is often removed to increase the ability to promote newly developed software more quickly.
Likely, we will never really know what happened to cause this issue – was an incorrect server date promoted from a development environment to the production servers? Was this actually a hack that’s been covered up as a bug to prevent any fears about users’ security and data?
I doubt Twitter will ever truly publicly acknowledge the cause beyond its current admission of a “bug in our front end code”, but I don’t think it matters. In my mind, these types of bugs will become more and more commonplace if we choose to continue to diminish the human element of software testing and verification. While we believe we are eliminating costs in delivering new features to our users, we are not truly removing them but rather transferring them to our customers. As in this case, instead of Twitter bearing the cost of having a small group of humans test the software thoroughly, its 284 million users acted as its quality control department. Is this what we really want to happen?
Kevin Dunne is a product specialist for QASymphony, striving to ensure the continued success of existing and prospective members of the qTest community. Having acted as a tester in his previous jobs, he enjoys interacting with customers on a daily basis to keep current on the latest trends and tools in the testing world. He is always eager to hear what others think about the industry – feel free to drop him a line at email@example.com or connect with him on LinkedIn: www.linkedin.com/pub/kevin-dunne/36/b73/ba7/