Creating a High Performance Cloud Hosted Homepage - codecentric AG Blog

:

When we began thinking about a relaunch of our homepage 3 months ago, we already knew a few key facts about what we wanted:

  • use high quality cloud infrastructure
  • run with high standard performance
  • use a flexible standard software stack
  • provide great ease of use for editors

While we had used OpenCMS for the homepage in the past, we decided to go for WordPress for our new homepage. We have been using WordPress for our blog for years and it provides the ease of use we want. It is dead simple, yet powerful. Even with its default configuration, it offers sufficient functionality, and it can be easily extended with custom or off-the-shelf plugins.

But this post is not about our decision to go with WordPress, but how we achieved to run a medium traffic homepage and blog for free, without compromising speed.

Did I say free? – Yes!

Amazon offers new customers a free micro tier for a year: aws.amazon.com/free. new customers are just new accounts. But, of course, it is quite restricted: You get a virtual CPU, 613MB memory and have to use Amazon Linux.

After I started the micro instance with its 8GB EBS, I installed the basic services required for WordPress:

sudo yum install httpd mysql mysqld php php-mysql

sudo yum install httpd mysql mysqld php php-mysql

After installing and configuring WordPress everything looked fine, but performance degraded fast, when multiple users were accessing the site.
The first action I took was to install APC, a bytecode cache and optimizer. You should always use APC when deploying a PHP site:

sudo yum install php-pecl-apc

sudo yum install php-pecl-apc

The situation improved quite a bit. But I would need to improve it even more to make the site reliably serve multiple users at the same time.

When I worked with the free micro instance I learned two things

  • “Steal Time”! Never seen this before:
    Cpu(s): 2.0%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 97.7%st

    Cpu(s): 2.0%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 97.7%st

    That means, the system wants to use more CPU time, but does not get it from the hypervisor.

  • EBS is slow! But others noticed that, too.

The process suffering most from those findings was mysqld, so I considered how to improve that.

I applied the WordPress mysql tuning recommendations by Matt Mullenweg, but this alone did not show any significant impact.
It was just not possible to get good performance out of mysqld due to CPU and IO limitations. I couldn’t bypass IO completely by caching everything due to memory constraints.

So I looked for a good plugin for WordPress to avoid database access through clever caching of some stuff. And indeed there are a few plugins for that purpose:

I ended up installing Quick Cache (which no longer exists, an alternative would be Total Cache), and was pleased to see server load drop significantly.

It did not work for many people editing, as the caching has no effect. Amazon does limit the CPU to short bursts. many people editing can sometimes result in 30 seconds steal time due to that. To fix this, I limited the mysqld cpu usage:

wget http://centos.alt.ru/repository/centos/5/i386/cpulimit-1.2-1.el5.i386.rpm
rpm -i cpulimit-1.2-1.el5.i386.rpm
nohup sudo cpulimit -v --exe mysqld --limit 50 &

wget http://centos.alt.ru/repository/centos/5/i386/cpulimit-1.2-1.el5.i386.rpm rpm -i cpulimit-1.2-1.el5.i386.rpm nohup sudo cpulimit -v --exe mysqld --limit 50 &

Now that I had optimized most CPU and IO issues with clever caching, next up was tuning the httpd.conf to make sure the server handles the right number of users at any time. Because we use mod_prefork, we had to look carefully at the amount of memory each Apache process was using.
To bring down memory consumption of httpd, I removed most loaded modules, but still every Apache was eating up between 25 and 35MB of memory. mysqld was taking around 80MB, so I decided to go for about 15 worker instances, which would utilize almost all available memory.

StartServers 15
MinSpareServers 15
MaxSpareServers 15
ServerLimit 15
MaxClients 15
MaxRequestsPerChild 500

StartServers 15 MinSpareServers 15 MaxSpareServers 15 ServerLimit 15 MaxClients 15 MaxRequestsPerChild 500

MaxServersPerChild is important to recycle instances in case of leaking memory.

15 concurrent connections is quite a low limit, so as a next step I had to make sure server communication was reduced to a minimum.

With the server having a solid setup, the low hanging fruit are now on the browser side. The default output of a WordPress theme with multiple plugins installed is very verbose. Because you do not want to modify the theme a lot and have little chance to tweak code generated by plugins, we needed a better way to optimize html.

This can be easily done by the fabulous mod_pagespeed httpd module from Google.

sudo rpm -i mod-pagespeed-*.rpm
sudo service httpd restart

sudo rpm -i mod-pagespeed-*.rpm sudo service httpd restart

The impact was again very significant. The aggressive cache settings cut down most of the sever requests, removing strain from the limited amount of httpd processes. Also, the reduced image sizes were very noticeable. Especially in a content management system, you have limited control over the images used by editors and authors. mod_pagespeed automatically reencodes them and stores them on disk.
But mod_pagespeed does not create the cache out of thin air. It checks the image every time and just skips the reencoding if the cached version is valid:

46.137.99.225 - - [07/Aug/2011:03:54:37 +0200] "GET /wp-content/uploads/2011/07/wanted-memory.png HTTP/1.1" 200 116616 "-" "Serf/0.7.2 mod_pagespeed/0.9.18.6-886"

46.137.99.225 - - [07/Aug/2011:03:54:37 +0200] "GET /wp-content/uploads/2011/07/wanted-memory.png HTTP/1.1" 200 116616 "-" "Serf/0.7.2 mod_pagespeed/0.9.18.6-886"

this is where mod_expires comes into play:

ExpiresActive On
ExpiresDefault "access plus 10 days"

ExpiresActive On ExpiresDefault "access plus 10 days"

What time you choose depends on the type of images. If you can guarantee that the image will never change go for years. A week is generally a good point to start. As this setting is only affecting mod_pagespeed on the server, you can even go a bit more aggressive and clean the cache after updates.

To further clean up the html beyond what mod_pagespeed can do, I removed most of blog stuff by adding this to functions.php:

remove_action('wp_head', 'rsd_link');
remove_action('wp_head', 'wlwmanifest_link');
remove_action('wp_head', 'feed_links_extra', 3);
remove_action('wp_head', 'feed_links', 2);
remove_action('wp_head', 'remote_login_js_loader');
function remove_generator() { return ''; }
add_filter('the_generator', 'remove_generator');

remove_action('wp_head', 'rsd_link'); remove_action('wp_head', 'wlwmanifest_link'); remove_action('wp_head', 'feed_links_extra', 3); remove_action('wp_head', 'feed_links', 2); remove_action('wp_head', 'remote_login_js_loader'); function remove_generator() { return ''; } add_filter('the_generator', 'remove_generator');

WordPress ships with a jquery javascript library, but to benefit from caching the most and saving more server traffic, we should use the jquery hosted on the Google Content Delivery Network.
Additionally we stop loading the l10n.js which we do not need:

function init_scripts() {
wp_deregister_script( 'jquery' );
wp_register_script( 'jquery', 'http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js');
wp_deregister_script('l10n');
}
 
if (!is_admin()) {
add_action('init', 'init_scripts');
}

function init_scripts() { wp_deregister_script( 'jquery' ); wp_register_script( 'jquery', 'http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js'); wp_deregister_script('l10n'); }if (!is_admin()) { add_action('init', 'init_scripts'); }

Optimization is done mainly for unauthenticated users. To prevent issues for administrators in the complex WordPress admin interface, most changes are not done for logged in users.

As a result of this, www.codecentric.de now has a pagespeed rating of 95/100

A single user should experience page loads in less than a second, for most pages in less than 300ms.

So how did it work out at scale? I decided to run a benchmark against the old dedicated machine and the new EC2 micro. 4 real cpus vs 1 virtual one, 2GB of memory vs 613MB. The comparison is not completely fair, as I did not tune and tweak the old blog to the extreme.
I ran the benchmark using apachebench, with 5 concurrent users and 50 requests hitting the homepage.

old:

Requests per second:    3.07 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        9   71 424.8      9    3014
Processing:   989 1513 461.3   1352    3420
Waiting:      597  865 344.2    768    2861
Total:       1004 1584 603.1   1373    4287

Requests per second: 3.07 [#/sec] (mean) Connection Times (ms) min mean[+/-sd] median max Connect: 9 71 424.8 9 3014 Processing: 989 1513 461.3 1352 3420 Waiting: 597 865 344.2 768 2861 Total: 1004 1584 603.1 1373 4287

new:

Requests per second:    18.04 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       38   44   4.4     44      52
Processing:   183  224  47.3    220     493
Waiting:       58   89  41.4     80     335
Total:        222  268  49.5    265     544

Requests per second: 18.04 [#/sec] (mean) Connection Times (ms) min mean[+/-sd] median max Connect: 38 44 4.4 44 52 Processing: 183 224 47.3 220 493 Waiting: 58 89 41.4 80 335 Total: 222 268 49.5 265 544

even with the connect taking longer due to geographical distance to the Amazon Dublin datacenter, the results are amazing.

Because I know I can handle 15 concurrent requests, I tried running 45 requests in parallel, each doing 20 loads.

Requests per second:    50.00 [#/sec] (mean)
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       39   42   4.8     43      78
Processing:   300  839 116.7    869    1181
Waiting:      179  711 115.2    742    1049
Total:        340  881 116.8    912    1225

Requests per second: 50.00 [#/sec] (mean) Connection Times (ms) min mean[+/-sd] median max Connect: 39 42 4.8 43 78 Processing: 300 839 116.7 869 1181 Waiting: 179 711 115.2 742 1049 Total: 340 881 116.8 912 1225

Even with higher load, response times are still below a second. Cool!

Is micro enough?

I am quite confident that micro serves us well for a long time. But there was an issue, which lead me to this tweet:

The moment we pushed the DNS update was scary.
The site was slowing to a crawl, but why? I had checked everything!
But we unknowingly caused a distributed denial of service (DDoS) attack. I checked the access_log and saw what was causing the trouble:
All the search engines I knew were crawling our new site and blog, and additionally tons of bots I did not even know to that date.
To make it worse, we posted the relaunch on internal chat and mail, causing over a hundred codecentric employees check out the new site.
On top of that, all caches were cold.

Cloud to the rescue!

Because this is Amazon, I could resolve the situation easily. I stopped the instance, changed the type to m1.small instead of t1.micro and hit start.
After the machine came back and reassigned the virtual IP itself, I reconfigured apache to run up to 50 processes.
The whole “upscaling” process took less than 30 seconds, and was not noticeable for users who already were complaining about slow response.
The machine was still heavily loaded, but response times were at an acceptable 1-2 second level.
4 hours later it was all over. Crawling bots and codecentrics were gone, the site was calm.
Another 30 seconds later, the machine was back at t1.micro, and free. Spendings for this exercise: 16 cents

I am quite sure that there will be never again such a load on the system as during launch. And in case there will be, we can scale up in less than a minute.
For example on Fridays, when many codecentric employees blog or publish content on the homepage, we just scale up.

I hope you enjoyed reading our experience. If you consider switching to the cloud, or want to talk to us about improving performance in the web or on your sever, feel free to contact us.