Database Read/Write Splitting in Frameworks/ORMs

Although one of the primary ideas behind frameworks is to keep things as simple as possible, sometimes they create issues in the long run. What I am about to discuss is something of a luxury problem (as scaling usually is), but it is a problem nonetheless.

When initially starting a project, whether you are using Ruby on Rails (Ruby), Django (Python), CakePHP (PHP), Catalyst (Perl), or any of the other 100s of frameworks in any of the languages out there, the first and most important thing to do is to get it out the door. Once you have done that, it’s time to get users, fix bugs, and add features. After you have done all that and you have a great web app, its time to think scaling. (Yes I realize that I have trivialized this process immensely, but its for a point, I promise).

When starting to scale (whether its out or up) and you decide its time to add another database, its necessary to analyze your app and decide whether its read heavy or write heavy. A lot of scaling comes in knowing your application and where its bottlenecks are. Let’s assume that you are at the point that you need to add a database server. What would be great is if you had a framework that allowed you to set some database servers as read-only in order to take load off the master.

In an abstract format, it would be a good idea to break out your SQL requirements into 2 functions: sql_write_query and sql_read_query. Then have the functions go to your primary database server and slave database servers respectively. The reason that you should do this instead of using a single function that sends the query to the “correct” location based on the SQL it finds is that your slave servers may be behind the master (which is the nature of replication) which could give you an incorrect result in your query. This way, depending on the importance and type of the query, in your code you can choose the location that you want to send the query to. The read queries where accuracy is extremely important can be sent to the database using sql_write_query and all others can be executed normally using your sql_read_query function.

How does this relate to frameworks and ORMs? It would be very handy if frameworks provided a method to expand an application into splitting read and write queries that is native to the frameworks. If it is native and isn’t hacked on afterward (like below), then you don’t have to muck around in the core code or write a plugin and you can stick to what you know best (which is your application). That is not to say that one should prematurely optimize an application (which is a whole other issue that you need to be careful of) and build it out in a split read/write fashion from the beginning, but that there should be a native way for the application to be faster should you reach that point.

Building this out as a afterthought can be done like what’s below (the example is in Python but can be extrapolated to the language specific to the framework).

import MySQLdb

def write_mysql_query(query):
    conn = MySQLdb.connect(host = "dbvip1.example.com",
                          user = "root",
                          passwd = "pass",
                          db = "myapp")
    cursor = conn.cursor()
    cursor.execute("SET AUTOCOMMIT=1")
    cursor.execute(query)
    cursor.close()
    conn.close()

def read_mysql_query(query):
    conn = MySQLdb.connect(host = "slavedb1.example.com",
                          user = "root",
                          passwd = "pass",
                          db = "myapp")
    cursor = conn.cursor()
    cursor.execute(query)
    result = cursor.fetchall()
    cursor.close()
    conn.close()

You can even do things for the writer in a more transactional fashion using START TRANSACTION and COMMIT if you don’t like using AUTOCOMMIT. You’ll also notice that there is a connect every time a query is executed. A lot of people will have an initial gut reaction of a problem here. In fact, since most of your queries will be taking place over a LAN with a pretty fast backplane (or some other variation of a high speed network), it’s probably negligible. Taking the load off of your master and dispersing it onto slaves will make the most difference here.

All this is a very oversimplified way of taking this step, but it is something that frameworks should consider. Even if its just in a plugin fashion which can be taken advantage of if the database server is getting overloaded.

  • http://blog.perplexedlabs.com matticakes

    While I see the point you're raising I think it's interesting to keep in mind the ol' 80/20 rule. A web framework is trying to solve 80% of the problems for you.

    They get you *most* of the way there, quickly. When you're hitting scalability issues it might be time to break out of the confines of an ORM anyway, whether or not you've encountered this particular issue.

  • http://eric.lubow.org Eric Lubow

    I agree with your assessment of the goal of frameworks. But if you reach the issue of scaling, is it really worth refactoring the entire code base to scale out your database system? And if a framework is solving 80% of your problems, does that mean that when your ready to scale out that another problem should be created as to how (on the database level)?

    The point I am aiming to make is that by baking this ability into the core code base of a framework and making it available to use when necessary (or desired) you are giving the developer the ability to scale more easily. And I believe giving this ability in advance is less dangerous than having someone try to hack it in later. It also provides a consistent way of achieving this goal for everyone who uses a framework/ORM to follow so there aren't people consistently reinventing the wheel.