There really isn’t a solid Python module for multiprocessing and MySQL. Now this may be because MySQL on a single server is disk bound and therefore limited in speed or just because no one has written it. So here is a quick and dirty example using the Pool module in multiprocessing in Python 2.6 and MySQLdb.
I also tried using PySQLPool. This was designed for threading and not forking as I am doing with Pool method. Although I am sure it is possible to use PySQLPool with forking by passing the connection (pool) object down to the child process or possibly doing something with IPC, I decided to keep it simple (although slightly more expensive) and instantiate MySQLdb connections upon fork.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | import MySQLdb import multiprocessing def main(): tablename = 'address_demographics' inserts = [] id = 0 max_id = 1000000 while id < max_id: sql = """ INSERT IGNORE INTO `%s` (person_id) SELECT person_id FROM `people` WHERE id BETWEEN %d AND %d """ % (tablename,id,(id + 100000)) inserts.append(sql) id += 100001 pool = multiprocessing.Pool(multiprocessing.cpu_count()) pool.map(sqlWorkerInsert, inserts, 1) pool.close() pool.join() def sqlWorkerInsert(sql): conn = MySQLdb.connect(host = "127.0.0.1", user = "USER", passwd = "PASS", db = "DB") cursor = conn.cursor() # Need to do this since AUTOCOMMIT = 0 by default (wtf?) cursor.execute("SET AUTOCOMMIT=1") cursor.execute(sql) cursor.close() conn.close() |
It may not be the cleanest method to open a new connection to MySQL for every worker subprocess. Since MySQL opens a new thread for every connection, I don’t think this is that big of a deal. However I am always open to new ways of doing things.