Python Multiprocessing Pools and MySQL

There really isn’t a solid Python module for multiprocessing and MySQL. Now this may be because MySQL on a single server is disk bound and therefore limited in speed or just because no one has written it. So here is a quick and dirty example using the Pool module in multiprocessing in Python 2.6 and MySQLdb.

I also tried using PySQLPool. This was designed for threading and not forking as I am doing with Pool method. Although I am sure it is possible to use PySQLPool with forking by passing the connection (pool) object down to the child process or possibly doing something with IPC, I decided to keep it simple (although slightly more expensive) and instantiate MySQLdb connections upon fork.

import MySQLdb
import multiprocessing

def main():
    tablename = 'address_demographics'
    inserts = []
    id = 0
    max_id = 1000000
    while id < max_id:
        sql = """
        INSERT IGNORE INTO `%s` (person_id)
        SELECT person_id
          FROM `people`
         WHERE id BETWEEN %d AND %d
        """
% (tablename,id,(id + 100000))
        inserts.append(sql)
        id += 100001
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    pool.map(sqlWorkerInsert, inserts, 1)
    pool.close()
    pool.join()

def sqlWorkerInsert(sql):
    conn = MySQLdb.connect(host = "127.0.0.1",
                           user = "USER",
                           passwd = "PASS",
                           db = "DB")
    cursor = conn.cursor()
    # Need to do this since AUTOCOMMIT = 0 by default (wtf?)
    cursor.execute("SET AUTOCOMMIT=1")
    cursor.execute(sql)
    cursor.close()
    conn.close()

It may not be the cleanest method to open a new connection to MySQL for every worker subprocess. Since MySQL opens a new thread for every connection, I don’t think this is that big of a deal. However I am always open to new ways of doing things.