It's good to see this written up somewhere. The technique is widely known in the...

adriand · on Feb 19, 2012

I'm interested in trying out this technique for a site we built but your comments about Googlebot and the associated downside with long-tail pages gives me pause. Do you have any suggestions on how to evaluate whether or not a technique like this would bring advantages, or is it better to simply give it a try and see how it works out?

In our case, the site features real estate listings, and at any given time there are about 50,000 listings. Each listing has its own page. There are numerous other pages, some of which are just index pages (lists of listings), a couple of search pages, etc. But the majority of pages are the single listing view pages, and then following that, the lists of listings. Let's call it around 60,000 pages or so.

The total number of pageviews per day (not counting search engine crawlers) is less than the total number of pages, but I would be willing to be that some individual listing pages are accessed at a much higher rate than others (namely, the newest ones). The listings themselves are updated on a daily basis, so the cache stores would get invalidated at least once a day.

In this situation, how do I determine the appropriate caching strategy?

cstejerean · on Feb 20, 2012

If you have enough cache capacity to keep every individual listing in cache then you could simply write a script that periodically hits the detail page for each listing to make sure the latest version of each listing is always available in cache.

yxhuvud · on Feb 19, 2012

Best writeup on the topic of caching in rails that I've seen is http://broadcastingadam.com/2011/05/advanced_caching_in_rail...

adman65 · on Feb 20, 2012

Hey, I wrote that! Thanks for the compliment :D

JonnieCache · on Feb 19, 2012

Every rails developer should read this article.

ssmoot · on Feb 19, 2012

This might already be covered, but unless you're running a threaded server or have developed some other sort of centralized locking, it's not hard to get race conditions in the cache generation.

For Basecamp it probably doesn't matter that much. For complex actions that serve mostly public requests that hit mostly "all at the same time" (think of a Media Embargo lifting on a new product announcement) it can cause real problems.

Ideally you want the first request to block the rest, so that the cache is only generated once, the other requests just spinning until the first is done, the cache is ready, and the lock released. That way all those other requests just consume sockets instead of unnecessary CPU/IO/DB time.