28 Mar, 2003
Compressing Webpages for Fun and Profit
Posted by: Christine In: How to's
(Written by the Guest Goddess, Photo Matt. Please note: You need to have PHP on your server to do this. No PHP? Won't work.)
So your page is now totally pimped out. You have gads of content on your sidebar, you've used ScriptyGoddess know-how to have comments and extended entries pop out like magic, and you even have some entries to take up some space between all the gadgets. The problem? The code on your page is now weighing in at half a meg and you can actually hear people cry when they load your site with a modem. You start to think about what features you could take out, maybe cutting out entries on the front page, but what if I told you that you can third your content easily with no work on your part whatsoever? It sounds like a pitch I might get in a lovely unsolicited email. The secret lies in the fact that every major browser of the past 5 years supports transparently decompressing content on the fly. There are three ways to do it—easy, right, and weird—and we'll cover all three here. Before we even get started you should check for compression of your pages, because if it's already happy it's probably best to not fix what ain't broken.
<?php ob_start("ob_gzhandler"); ?>
I hate to be anti-climatic, but that's it. Put that at the very top of your PHP-parsed page that you want to compress and that's it. The only thing to watch for is it really does have to be at the top, or the sky will fall. Actually before you call me Chicken Little, you'll probably just get a cryptic "headers already sent" error, but you can never be too careful. Basically what this magical line of code does is start an output buffer which takes all your content, checks if the client can receive compressed content, and if it can it zips up the buffer and sends it on its merry way. This can be a great technique to curb your bandwidth usage to; I've seen it save gigabytes on content-heavy sites.
While the overhead associated with the above is minimal, if you'd like to see the benefits of compressed content on a larger scale, mod_gzip is the way to go. Mod_gzip is an Apache module which will compress files whether they are CGI scripts, processed by PHP, static HTML or text files, whatever it can. It is completely transparent to both the user and client, and it supports sophisticated configuration to allow it to be tweaked to your heart's content. However if you don't have permissions on your box to compile modules and modify httpd.conf, this option is unavailable, but don't let that stop you from bugging your host to include it, as there is really no good reason to not include it. It's always faster to send a smaller file. If you're interested in writing your own Apache module, studying mod_gzip is a great way to learn as it has extremely informatative debug code.
There are certain circumstances where output buffering, which by definition has to wait for everything to process before it sends anything to the browser, can cause a perceivable delay in viewing scripts that take a while to run. With mod_gzip this isn't a problem because it streams content as it comes to it, and using PHP it doesn't have to be a problem either because it offers an alternative method of compressing and sending content, called zlib output compression. It's a little trickier to enable though, because there is no good way enable or disable it with straight PHP code, so the way we're going to do here is use .htaccess to modify the php.ini configuration. Instead of waiting until everything is finished, zlib output compression can take the content as chunks and send them as it comes to it. Here's what you need to put in your .htaccess file:
php_value zlib.output_compression 4096
Basically what this code says is if the file ends in php, htm, or html turn zlib output compression on and stream it out every 4 kilobytes. It's common to see a 2K buffer suggested on the web but I've found the overhead with that is higher, and this is a nice balance. You should know that this is the slowest of the three methods, but by slow I mean it adds .003 seconds instead of .001, so it's not really that big of a deal.
So now you have a faster site that's more fun to visit, and you're saving money on bandwidth. You can sit back now and wait for the love letters to pour in from your readers saying how much faster everything is loading. Enjoy!
- Like with so many other things, Netscape 4 really screws up gzip encoding in a lot of ways, but you can avoid 99% of its problems simply by making sure that you don't gzip any linked JS or CSS files and you should be alright. On a more technical level, early versions of Netscape 4 try to use the browser cache to store compressed content before decompressing it, which works unless you have your browser's cache turned off, and then it will do something crazy. Note that this behavior even varies from version to version of Netscape 4, so overall I wouldn't worry about it.
- If you're doing things over SSL and you want to use mod_gzip as well, you have a little hacking to do.
- PHP.net documentation on ob_gzhandler and zlib output compression (they recommend using zlib).
- Things like images, zip files, and Florida ballots are already highly compressed so trying to compress them again might actually make them bigger. And then you have to recount.
- Avoid compressing PDF files as well because sometimes Internet Explorer on Windows (the 900-pound gorilla) forgets to decompress them before the Acrobat plugin takes over.
- According to the RFC, technically compressed content should be sent using transfer encoding rather than content encoding, since technically that's what is going on. One browser engine supports this, can you guess which one?
- Internet Explorer on Mac doesn't support any sort of content compression like the methods described above, but that's okay because all of the above methods intelligently look for the HTTP header that signals the client can accept gzip encoding, and if it isn't there—like in IE Mac, handheld browsers, whatever—they just sit idly by.