Pawel Janiak

South African Ruby on Rails developer, ranter, biker.

Some Memcached tips

16 Apr 2013

Intro to caching with memcached

Memcache is a high-performance, distributed memory object caching system. It is a very simple cache store and it should be treated as a simple cache store and nothing else. It is worth pointing out that memcached is the daemon that runs the Memcache service although most people just call it memcached. Adding complexity such as expiry times means overhead and defeats the purpose of memcache’s simplicity and speed. As your application’s database transcations start getting expensive and complex, you should start thinking about using a cache store like memcache to reduce the amount of queries passed to MySQL, or whatever RDBMS database you use.

Is memcache the right tool for the job?

If you need replication then you are most likely using memcache wrong, use something like Riak or Redis instead. If you think you require backups of your memcache to be restored after reboots or some kind of failure, then you’re also using it wrong. It is a transient cache store so don’t give it more responsibility than it should handle. Think of memcache as never being able to guarantee your data and if you need guarantees then look elsewhere. Memcache and services like Redis are not mutually exclusive so be confident in delegating different types of responsibilities.

Expiring keys and memory limits

In case you were wondering what happens if you use a memcached service with a small memory limit and you hit that limit, your application won’t stop writing to memcache. Memcached will automatically evict the oldest keys first when it runs out of space. Having a smaller limit just means that you have a smaller capacity to hold your cache keys, so you don’t have to worry about your cache store falling over as your cache key usage increases.

Performance

You should try to get as many values out of memcache in bulk as you can per request. For instance, if a request from the memcached client will need 10 different values stored in cache, doing them in series will always be more cumbersome than doing one MGET that fetches all 10 values at once.

When you configure your connection pooling use external IPs. This means faster connections because it won’t have to resolve DNS entries. If you have several web servers where each server has a memcached instance running then naming becomes important. Don’t name any of the servers as “localhost” because the memcached servers will be different between the web servers. Another thing to remember is that ordering is important and the servers should be listed in the same order across all configuration settings to ensure consistency across clients. If you have servers “One, Two” then list those as “One, Two” in that order all places.

Compress large key values wherever you can. This will get you the most performance out of the memory you have available and can even reduce latency because a value that is smaller due to compression can be quicker to fetch over the wire than a big value.

Don’t initialize connections every time a request is made to memcached. To do this, ensure that the client isn’t calling the addServer command on every connection. Refer to the client’s documentation for more details.

The maximum item size is configurable so if you need to store values that are big, you can increase the max value to 10mb using memcached -I 10m. You may want to be careful with this and ensure your cache hits won’t evict older keys because memcached can’t find available slabs with enough minimum memory.

Caching mitigates database strain but also isn’t a freebie in and of itself. Requests still need to be made, sometimes even via a network roundtrip. However, a trip to a memcached key is better than a trip to the database. Your rule of thumb should be that any database call or API hit should be cached at some point before the response is sent to the user.

Cache warming

If possible you should consider using a deployment script that pre-populate memcached with certain keys during each build. This “cache warming” will mean that you will have less cache misses the first time round for certain requests. Also, don’t give all your keys the same TTLs so they don’t all expire at the same time. Doing this ensures that you don’t get spikes of requests trying to make requests to your database because the cache keys have expired simultaneously. In fact, talking about key expiry, if you want to delete or expire items of your choice from your cache, redis is what you should use. It is possible to configure Redis to behave just like Memcache by disabling the persistence of keys in the redis.conf file, for instance.

Monitoring and analysis

By analyzing your keys, your could find popular keys. These can be added to the keys that your deployment scripts can pre-populate memcached with. This is all the better if you have multiple memcached servers. Familiarize yourself with how your memcache clients select which daemons it stores any given key on so that you have more information to draw on when debugging speed and performance issues. It may be wise to think about dedicating some of cache memory to storing oftern recurring requests that respond with a 404 either for records that don’t exist or that don’t exist anymore.

Be sure to use some kind of tool to measuring the cache hit rate. What really happens and what you think may be happen because you’ve made some configuration changes can sometimes be very different. It’s always good to look at cold, hard numbers to reveal the truth.

Cache in development

To understand the full impact that caching has on your application, use memcache during development. This allows you to understand it better as a developer relative to your environment, and it gives you a chance to test usage and how to bust cache keys. You will also want to mirror your production environment as much as possible. Installing memcached locally is trivial.

Memcached and Ruby on Rails

In Ruby on Rails, if you have an Article model, for instance, and the same articles get accessed hundreds or thousands of times every hour, you would want to cache them instead of querying your database every time. You could use ActiveSupport::Cache::Store’s #fetch method. This method fetches data from the cache store using the given key and if there is data in the cache with the given key, then that data is returned. If nothing is found then it returns nil, otherwise if a block is passed, it will write a new key with that data to be returned in subsequent requests. An example would be:

def self.fetch(id)
  Rails.cache.fetch("article/#{id}") { Article.find(id) }
end

If your Rails application is hosted on Heroku then you won’t be able to use page caching. Page caching requires write access to the file system. That’s no real loss because HTTP level caching is superior to page caching as the request won’t even hit your Rails stack so use that if you have to. On Heroku, you can only use action caching or fragment caching. If you’re upgrading to Rails 4 or starting a new project on it then use fragment caching as both action and page caching are removed in the new version of Rails.

Fragment caching is the easiest and most straight-forward caching method. With fragment caching you can easily cache a partial for each individual model instance. Rails automatically generates a cache key when passed an ActiveRecord object thanks to the cache_key instance method which will provide a unique identifier of the table name and the instance’s id. This is used like so:

Rails.cache.write(cache_key,self)

Discussion on Hacker News