MogileFS and race condition

As any readers of the iContact blog may have learned, MogileFS has become an integral part of our infrastructure at iContact. Rather than store the bodies of messages in our database, we moved them to a quick&dirty storage method in our infrastructure long ago. This method was essentially a cheap WebDAV server and on each STORE command it would write to two backend servers and issue a GET from only one. About a month ago, we migrated most of our messages away from this older, less scalable method to our newer MogileFS backend.

Our MogileFS setup allows the disk space on each web server (normally unutilized) to form a cheap storage node, and make use of space that would otherwise go entirely unused.

On Monday 1/21 the database servers behind MogileFS paged with too many connections, which leads to Mogile going very slowly for a while, and sometimes requiring a restart of some of the nodes.

This database issue cascaded into us asking our Mogile client for item A, but receiving item B in response…
Continue reading