Do you run your own web site? Do you know if it can handle a traffic spike? Do you know if it performs as well as it could?
There is a huge amount of work involved in getting a web site built and deployed. Once you decide to move forward with an idea for a site you have to determine what exactly you want to build, what features you want it to have, design the look of the site, write the code, write unit tests to make sure everything works correctly and continues to work in the future, figure out how to let people know about the site and so on.
A huge amount of time and effort can go into just building a site and getting it deployed for everyone to see. Now imagine one person has to do all that work and it is easy to see how some aspects might be delayed or outright forgotten about.
If, like me, you have worked for larger companies that have their own web operations staff or you are just new to running your own web site this information should give you a better idea of how well your site performs and how you can make it perform better.
Please also be aware that I don't claim to be an expert in this area. If you see some glaring problem with my recommendations feel free to point them out in the comments.
Determine Site Bandwidth
The first step is to find out what your bandwidth limits are. I host this site on a Linode VPS which provides 200gb of bandwidth per month. Some shared hosting services are providing unlimited bandwidth, but if you are getting enough traffic on a regular basis to exceed 2-300gb of bandwidth a month you will probably be causing problems for the other customers on the shared host and you might get asked to move anyway.
Most of the calculations I will be doing will be in kilobytes to keep things simple. One gigabyte is equal to 1,048,576 kilobytes, so the total amount of kilobytes I can send without incurring bandwidth charges is 200 * 1,048,576 = 209,715,200 total kilobytes/month.
There are numerous free bandwidth calculators around, but they all assume you know how many page views you expect in a given month. This article will look at bandwidth from a different perspective. Given my bandwidth limit, how many page views can I send per month.
To get an average page size for your site choose one of the tools above and test a good sample of your pages. Test smaller pages as well as larger pages to get a good average. I ended up with an 186.3kb average after testing my pages.
Now its just simple math to figure out how many page views I can handle before I reach my bandwidth limit. If I divide the total kilobyte/month by my average page size I will have an approximate number of page views per month that will stay within my bandwidth limit, so 209,715,200 / 186.3 = 1,125,685 page views per month.
That number is approximate of course. My home page will fluctuate in size depending on how large or small the current blog posts are. It gives me a ball park figure to work with however.
My site isn't getting anywhere near that amount of traffic right now, but what happens if I get a spike from Hacker News. That is the most likely site that I would be getting a traffic spike from, so it seemed like the best place to start. After some searching and reading I found that a Hacker News spike can range anywhere from 7000 - 60,000 page views in a day depending on the popularity of the article.
If I had that many page views a day would I go over my bandwidth limit? 1,125,685 / 60,000 = 18.76. I could sustain heavy traffic for almost 19 days before exceeding my bandwidth. That isn't bad. It also isn't very likely that I'll ever get that many page views for that long. How many page views could I handle a day without going over my limit? 1,125,685 / 31 = 36,312. I could handle roughly thirty six thousand page views a day for a month.
Now we have some simple equations to get an idea of how much bandwidth our web pages use and how many page views we can generate per month:
- limit-in-gb x 1,048,576 = limit-in-kb
- limit-in-kb / average-page-size = approx-page-views-per-month
- limit-in-kb / high-page-views = days-until-bandwidth-is-exceeded
- limit-in-kb / 31 = page-views-per-day-within-bandwidth
Optimize Bandwidth Usage
Now that you have a good idea what your web sites page sizes are and how much bandwidth you use it is time to look at reducing and optimizing your bandwidth usage as much as possible.
Reduce Page Size
The very first thing to do is reduce the page size as much as you can.
Optimize Images - If there are any background image files that are used in your CSS and used on every page you should make sure those image files are optimized and made as small as possible.
Remove HTML Comments - Comments are generally for the site developers and not the viewers, so removing or at least keeping the size of HTML comments low will shave off some more from your page size. If you are using an HTML template system like Haml or ERB you can use the template language comment which will be removed when the final HTML is generated. That way your developers can comment as much as they like and none of those comments will be sent to the browser.
Enable Compression - Most web servers support compression now, but it is generally not enabled by default. Investigate you web server documentation and configuration settings to make sure your pages and assets are being compressed before being sent to the browser.
There are three types of HTTP caching, browser cache, proxy cache and gateway cache. In the context of this article we are interested in browser caching. Browsers set aside space on the users system to store assets that have been downloaded from a web site, but a browser will only store assets in its cache that have some kind of HTTP cache header. (Expires:, Cache-Control:, Last-Modified:, Etag:)
To avoid the validity check add the Expires: header. If an asset has an Expires: header the browser will save the expiration time along with the asset in the cache. If that asset is requested again and it has not yet expired the browser skips the validity check saving a round trip to the web server.
Actually adding the cache headers to your web pages and static assets is going to be specific to your site. You can do it directly in the web server or in the language or framework you are using.
HTTP caching is a large subject and I can't cover it all here. If you are interested in the details I have found several good articles:
For a quick and dirty description of cache headers see Doing HTTP Caching Right by Joe Gregorio.
That's it for this article. I hope you have a better idea how your bandwidth is being used and some ideas on how you can improve you pages to be more efficient.