Load testing reveals issues in the cloud
Nowadays it is rare for a website or web application to go live without meeting certain performance criteria. This normally comes down to the number of web pages or transactions per second at a given download speed. Testing the website against these criteria is essential to delivering the intended end-user experience.
However, as we move to cloud infrastructures, the benefits of the cloud, such as the dynamic addition of infrastructure through auto-scaling and in Microsoft’s Azure the setting of predictable performance levels with database transaction units (DTUs), must also be load tested. This is important to ensure not only that these features work as expected but also that costs are not unnecessarily incurred or business not transacted through misconfiguration or even under-assignment.
Auto-scaling and load testing
As a major benefit of cloud environments, auto-scaling needs to be working correctly as your load fluctuates. Otherwise, you will end up paying for services you are not using. Cloud vendors have made the configuration of auto-scaling simple, so that the service is quick and easy to use, but when dealing with dynamic loads on a system, you need to ensure that the rules will work when you need them most.
In a recent load test the client began a user journey (UJ) test that would increase load up to 500 concurrent users and maintain that load over a 30 minute period. They were confident that their application would perform well, but we found that the way auto-scaling had been configured meant that end-user experience would have been extremely poor. Even after 2.5 minutes, the download speed of the UJ started to deteriorate.
The benchmark test showed that the UJ should complete in approximately 47 seconds, but as the load went over 50 users, download time started to climb very quickly. We observed that the CPU was also moving towards 100% at this point, but the expected auto-scaling had not kicked in.
Auto-scaling had been configured to bring more resources to bear from 70% CPU utilisation, but only if CPU was at 70% or greater for 10 minutes, so we had found the reason why more servers had not spun up. Sure enough, about 10 minutes after load times started to increase, the effect of the extra resources within the environment started to show, and download speed started to improve.
The client quickly configured a new rule that stated that if CPU is 80% or greater, spin up the extra resources immediately. This dynamic change to configuration proved effective as CPU started to climb again at about 18 minutes into the test, and this was sufficient to contain the load on CPU resources and start to improve download speed.
In the graph we can see than the UJ never recovers to the target time of 47 seconds, but investigation showed that this was due to a secondary problem that load testing had uncovered.
Database Transaction Units (DTUs)
Because we could now see the CPU resources had settled to an acceptable level, this secondary problem was something new.
The client’s databases are hosted in Microsoft’s Azure, where it is possible to procure a specific performance level for your database. This involves Microsoft committing a certain level of resources to your database with the intention of delivering a predictable level of performance. The Azure platform implements this through the concept of Database Transaction Units (DTUs), enabling Microsoft to charge a price that will depend on how performant you want your databases to be. It is therefore important to get these settings right.
Our analysis showed that the number of DTUs assigned to one of the three databases used in this UJ had reached its maximum. Consequently, the database was unable to work any faster.
Of course, the simple thing to do at this point would be to add more DTUs. However, over time, this could prove expensive. So instead the client decided to review the way this database was being used in this UJ. With the knowledge gained of how the application works under load, they were able to make some minor enhancements that resolved the issue without incurring further infrastructure costs.
Getting the most out of the cloud
As a mainstream technology the cloud has brought many benefits and advanced infrastructure features that enable consistent performance and dynamic scaling of services. However, regular and effective load testing is essential to maximising the cost benefits while ensuring that your website or application will perform for your end-users as expected.
Published date:  31 August 2017
Written by:  Phil Vandenberg