Python Multiprocessing Pools and MySQL

There really isn’t a solid Python module for multiprocessing and MySQL. Now this may be because MySQL on a single server is disk bound and therefore limited in speed or just because no one has written it. So here is a quick and dirty example using the Pool module in multiprocessing in Python 2.6 and MySQLdb.

I also tried using PySQLPool. This was designed for threading and not forking as I am doing with Pool method. Although I am sure it is possible to use PySQLPool with forking by passing the connection (pool) object down to the child process or possibly doing something with IPC, I decided to keep it simple (although slightly more expensive) and instantiate MySQLdb connections upon fork.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

import MySQLdb
import multiprocessing

def main():
tablename = 'address_demographics'
inserts = []
id = 0
max_id = 1000000
while id < max_id:
sql = """
INSERT IGNORE INTO `%s` (person_id)
SELECT person_id
FROM `people`
WHERE id BETWEEN %d AND %d
""" % (tablename,id,(id + 100000))
inserts.append(sql)
id += 100001
pool = multiprocessing.Pool(multiprocessing.cpu_count())
pool.map(sqlWorkerInsert, inserts, 1)
pool.close()
pool.join()

def sqlWorkerInsert(sql):
conn = MySQLdb.connect(host = "127.0.0.1",
user = "USER",
passwd = "PASS",
db = "DB")
cursor = conn.cursor()
# Need to do this since AUTOCOMMIT = 0 by default (wtf?)
cursor.execute("SET AUTOCOMMIT=1")
cursor.execute(sql)
cursor.close()
conn.close()

It may not be the cleanest method to open a new connection to MySQL for every worker subprocess. Since MySQL opens a new thread for every connection, I don’t think this is that big of a deal. However I am always open to new ways of doing things.

Python Multiprocessing Pools and MySQL

Don’t Buy My Book, It’s Old

Straight to Your Inbox

Videos

Manager Training

Beyond the Belt

Writing Archives

content

about

contact