Archive for August, 2008

Caching and compression for Apache and mod_rails Comments Off

Optimizing your web server configuration is an important step for any production web application. Compression and caching are two complementary techniques that can greatly improve the performance of your site. We won’t go into a lot of detail on the rationale for these changes.  Most of that is covered in profuse detail by that Yahoo Performance Team who produce the excellent YSlow! plugin for Firefox.

The code in this post is used for a soon-to-be production Ruby on Rails application using this stack:

Reliable, Performant Pre-Compression

Compressing text files in your application can lower bandwidth usage by a factor of 10 and decrease the amount of time to retrieve a web resource by the same amount.  In Apache, mod_deflate is the easiest way to enable compression. Mod_deflate will compress content on each request – for dynamic content, this is expected, however for infrequently changing static content such as CSS or Javascript files, this is redundant and can increase CPU load significantly.  To get a little more control over this, we choose to pre-compress static files on our site and serve them when appropriate to compatible browsers.

In the configuration below, we use apache mod_rewrite to handle this outside of our application:

IF Request is CSS or Javascript AND
  the browser can handle gzip compression, AND 
  the browser is not Safari AND 
  there is a file with the same name with an additional .GZ extension
THEN 
  Serve this compressed file instead of the original request

Evidently, some versions of Safari can get tripped up by this particular use of compression, so we leave them out of the fun for now.  It would be great to re-enable this if we can verify it is no longer an issue or has been resolved in the latest version of Safari.  TODO: Verify this assumption.

To handle pre-compressing files, there are a variety of approaches.  For the current Rails application I’m working on, we’ve integrated AssetPackager –  which can optimize, combine and compress these files as part of a build or deployment process.  It’s an excellent addition to the toolbox.

The section below enhances the configuration suggested by The If Works folks.


# USE PRE-COMPRESSED GZ FILES IF THEY EXIST - WE DON"T WANT TO COMPRESS ON EVERY REQUEST
RewriteCond %{REQUEST_FILENAME} \.(js|css)$
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{HTTP_USER_AGENT} !Safari
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [QSA,L]
<div>

Practical Caching Strategy

The next important step to enable is a reasonable caching strategy for our site.  Caching is critical to your web application for several reasons:

  • Users can navigate your site with less requests, improving perceived responsiveness
  • It enables the use of a high-performance acceleration or CDN layer
For high-volume web sites, proper attention to caching rules and application design should enable you to achieve caching rates in the 75-90% range.  Some sites have more dynamic content than others of course, but every site has a variety of images, static CSS and Javascript files which can benefit from a caching strategy.
In our current configuration, we want to identify a set of file extensions that are cacheable, and let proxies or browsers cache them for up to 1 week.  It is easy to expand this configuration to set different amounts of times for different file types.

<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4|js|css|gz)$">
Header set Cache-Control "max-age=604800, public"
ExpiresDefault A604800
Header unset Last-Modified
Header unset Pragma
FileETag None
Header unset ETag
</FilesMatch>

A couple important points about this configuration:

  1. We disable etags for these types, since it can be unreliable in clustered applications
  2. We leverage both Expires and Cache-Control since different browsers may rely on either one to be the definitive rule (Cache-Control is the new standard)

Deploying new JS or CSS files in our app could cause problems.  In our case, because we are leveraging AssetPackager, we get unique keyed filenames for these resources which change each time there are updates.

For example, AssetPackager merges 3 Javascript files into a single resource called base_timestamp.js where timestamp will get updated if any of the source files are updated.  This allows us to avoid any stale cache issues we might encounter after site updates.  You can see that if you change the content of one of these cached file types without also changing the name, some users will continue to reference the older files until their local cache expires.  An alternative remedy for frequently updated files is to set the cache timeout to a much lower value – 4 hours or 1 day, so that stale files won’t live as long.

While this is certainly not the end-all-be-all of configurations for applications, it is working well for us. The Charles Proxy was very helpful in verifying that the configuration we have is fully working as intended.

Have more best practices that we should incorporate into this configuration?  I’d love to hear them.  We will update this config with improvements as we find them.

Complete Config:


<VirtualHost *:80>
# BASIC SERVER CONFIG
ServerName www.yourserver.com
ServerAlias yourserver.com
DocumentRoot /srv/www/myapp/public
ServerAdmin your@email-address.com
ErrorLog /var/log/httpd/yourserver.com/apache_error_log
CustomLog /var/log/httpd/yourserver.com/apache_access_log combined

# ENSURE WE ARE IN PRODUCTION MODE
RailsEnv production

RewriteEngine On
AddEncoding gzip .gz

# IF YOU NEED TO DEBUG REWRITES
#RewriteLog "/tmp/rewrite.log"
#RewriteLogLevel 9

# USE PRE-COMPRESSED GZ FILES IF THEY EXIST - WE DON"T WANT TO COMPRESS ON EVERY REQUEST
RewriteCond %{REQUEST_FILENAME} \.(js|css)$
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{HTTP_USER_AGENT} !Safari
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [QSA,L]

# MAKE SURE THE BROWSER UNDERSTANDS WHAT TYPE OF DATA IT IS RECEIVING
<FilesMatch .*\.js.gz$>
ForceType text/javascript
Header set Content-Encoding: gzip
</FilesMatch>

<FilesMatch .*\.css.gz$>
ForceType text/css
Header set Content-Encoding: gzip
</FilesMatch>

#CACHE FOR A ONE WEEK
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4|js|css|gz)$">
Header set Cache-Control "max-age=604800, public"
ExpiresDefault A604800
Header unset Last-Modified
Header unset Pragma
FileETag None
Header unset ETag
</FilesMatch>

</VirtualHost>

An Experiment in Visual Price Comparisons – SATA2 Drives Comments Off

This is the first ‘labs‘ experiment that I’m posting on the site.  It’s a simple widget that allows you to visually explore the price, capacity and size of hard drives from an online retailer.  It’s an experiment in alternative ways of comparing items that makes it easier to assess all the options and (in this case) find the best deals.

SATA2 Internal Hard Drives

This experimental widget lets you explore real-time prices for SATA2 hard drives at one of the better online stores for computer gear.

You need to upgrade your Flash Player

Updated hourly

Read more about how the widget was constructed here.

It would be great to see more innovation from e-tailers in the variety of ways they help you find products.  Faceted navigation, and sophisticated search are requirements today, but visual search can also be a powerful tool.  Take a few minutes to explore the visual search tools at Etsy if you haven’t seen them before.  They make it fun and interesting to shop for regular stuff.

A simple visualization like the drive comparison, gives you 4 dimensions of information to share with the user – 2 dimensions along the X and Y axis, as well as bubble color and size.  There is a lot that can be conveyed using those extra dimensions.  It’s much easier to find high-lights and low-lights than with textual search results lists – as you find on most sites today.

It’s possible to add animation as well, which can introduce time-series into the mix – possibly an enhancment for the future.  Animated data visualization is something that both Google and Microsoft have expressed interest in, so it’s something to keep an eye on.