Posts Tagged ‘ MySQL ’

MySQL for Python

Monday, December 27th, 2010

I am always for using the right tool for the right job. A lot of time, that tool is Python. I have always had trouble finding solid documentation on using MySQL with Python. There was generally enough to get by, but the more the merrier. Enter MySQL for Python by Albert Lukaszewski. (more…)

Speeding Up Your Selects and Sorts

Thursday, March 4th, 2010

When you are using a framework, they typically set your VARCHAR size automatically to 255. This is normally fine since you are letting the framework abstract you away from most of the SQL. But if you interact with your SQL, there is a way to get a decent speed increase on your SELECTs and ORDER BYs when you are working with VARCHARs.

The VARCHAR data type is only variable character size for storage, not for sorting and buffering. In fact, since the MySQL optimizer doesn’t know how big the data in that column can be, it has to allocate the maximum size possible for that column. So for sorting and buffering of the name and email columns below would take up 310 bytes per row.
(more…)

Database Read/Write Splitting in Frameworks/ORMs

Monday, March 1st, 2010

Although one of the primary ideas behind frameworks is to keep things as simple as possible, sometimes they create issues in the long run. What I am about to discuss is something of a luxury problem (as scaling usually is), but it is a problem nonetheless.

When initially starting a project, whether you are using Ruby on Rails (Ruby), Django (Python), CakePHP (PHP), Catalyst (Perl), or any of the other 100s of frameworks in any of the languages out there, the first and most important thing to do is to get it out the door. Once you have done that, it’s time to get users, fix bugs, and add features. After you have done all that and you have a great web app, its time to think scaling. (Yes I realize that I have trivialized this process immensely, but its for a point, I promise).
(more…)

When To Use MySQL Cursor Classes In Python

Monday, January 18th, 2010

I have been writing a lot of code that has been interacting with MySQL lately. Sometimes I find it easier to work the result set in a dictionary form and other times it is easier with an array. But in order to not break all your code, it is necessary to set a default cursor class that keeps your code consistent. More often than not, I find using using a arrays is easier since I just want quick access to all the retrieved data. I also end up making my SELECT calls while specifying the columns and order of the columns I want returned.

The reason that using cursor classes is handy is because Python doesn’t come with a mysql_fetch_assoc like PHP or selectrow_hashref like Perl’s DBI interface. Python uses cursor dictionaries to bridge this gap. Ultimately your result is the same. But as with Perl and PHP, defaulting to cursor dictionaries isn’t a good idea for larger datasets because of the extra processing time and memory required to convert the data.
(more…)

MySQL Error 1033: Incorrect Information in File

Tuesday, January 5th, 2010

If you’ve ever been plagued by an error 1033 issue in MySQL (replication will show it as well), then I might be able to help you out. The error reads something like, “Incorrect information in file: ‘./mydb/table.frm’. I classify this as another one of MySQLs cryptic error messages. Here is how I determined that this was my problem.

Googling around got me an answer, but I had to read a bunch of different responses to piece together the answer. Essentially this issue (in my case) was a result of the InnoDB engine not loading up when MySQL was restarted. Therefore when MySQL tried to read the frm file (table description) which was written for an InnoDB table with the MyISAM reader, it didn’t like it. Since MyISAM is the fallback engine, it went to that and the table became unusable.
(more…)

Python Multiprocessing Pools and MySQL

Monday, December 21st, 2009

There really isn’t a solid Python module for multiprocessing and MySQL. Now this may be because MySQL on a single server is disk bound and therefore limited in speed or just because no one has written it. So here is a quick and dirty example using the Pool module in multiprocessing in Python 2.6 and MySQLdb.

I also tried using PySQLPool. This was designed for threading and not forking as I am doing with Pool method. Although I am sure it is possible to use PySQLPool with forking by passing the connection (pool) object down to the child process or possibly doing something with IPC, I decided to keep it simple (although slightly more expensive) and instantiate MySQLdb connections upon fork.
(more…)

Python's MySQLdb 2014 Error – Commands out of sync

Friday, December 18th, 2009

While writing a simple Python script to access and process data in a database, I came across an error that said:

1
Error 2014: Commands out of sync; you can't run this command now

After quite a bit of Googling and with very little findings, I had to dive in a little and try to figure out what was going on. The whole error looked like this:
(more…)

Percona Conference Slides

Saturday, May 9th, 2009

For everyone who was not able to attend this year, here is the link to the Percona Perfmance Conference slides: http://conferences.percona.com/percona-performance-conference-2009/schedule.html.

I suggest you check them out if you work with MySQL. Enjoy.

Counting Email Addresses By Domain in MySQL

Wednesday, April 1st, 2009

Every so often I find some statistical need that although Perl program would be easy to write for it, its probably something the database should just handle. So I have a column in an email management table that has just the email addresses in the format user@domain.tld.

I want to know which domains make up the majority of the users. (The numbers have been changed to protect the innocent):

mysql> SELECT SUBSTRING_INDEX(email, '@', -1) as Domain, count(*) as Total
      FROM email_list
GROUP BY Domain
ORDER BY Total DESC
       LIMIT 15;
+----------------+---------+
| Domain         | Total   |
+----------------+---------+
| yahoo.com      | 1304000 | 
| hotmail.com    |  908400 | 
| aol.com        |  800000 | 
| msn.com        |  168000 | 
| gmail.com      |  161000 | 
| comcast.net    |  143000 | 
| sbcglobal.net  |  110000 | 
| bellsouth.net  |   62000 | 
| cox.net        |   58000 | 
| verizon.net    |   56000 | 
| earthlink.net  |   52000 | 
| charter.net    |   46000 | 
| juno.com       |   30000 | 
| optonline.net  |   22000 | 
| netzero.com    |   17000 | 
+----------------+---------+

Character Encoding

Thursday, October 23rd, 2008

I recently ran into some character encoding issues and I wanted to share the fun. The default character encoding for MySQL on Gentoo is latin-1 or iso-8859-1. This wasn’t a problem until we recently started putting content straight from the DB through Java and onto the web. Java connects to the DB with a character encoding (typically UTF-8). Since UTF-8 is roughly a superset of iso-8859-1, it generally wasn’t a problem. Except when UTF-8 and UTF-16 characters were put into an iso-8859-1 database without translation.

What was essentially happening was that the data was being stored as iso-8859-1. The Java code was connecting to the DB in UTF-8 and pulling it into Java (which is usually UTF-16, but in this case was being handled as UTF-8). It was then being sent to the browser as URL encoded UTF-8 when in reality, it hadn’t even properly been put into UTF-8 character encoding. This then gave the web browser some funny yen symbols and question marks. This was not quite what we were aiming for.

The moral of this story is that it is necessary to realize the character encoding of the start point and end point of your data. It is crucial that the code points match up otherwise they could potentially make for an interesting screen given to the reader. All this could have been avoided with a simple: ALTER TABLE myTable MODIFY myColumn VARCHAR(255) CHARACTER SET utf8;.

MySQL Encoded URI Search and Replace

Friday, September 26th, 2008

So I can definitely not take credit for this trick. The credit goes to Augusto Bott of the Pythian Group for this. I have a table in my DB that has columns of encoded URIs. The list of URI encoding character translations is available here.

To accomplish this, the MySQL REPLACE function was used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> SET @test:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @test as original,
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),  -- REPLACE() is case sensitive
    ->        '%7c','|'),  -- so we have
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),  -- to repeat
    ->        '%2D','-'),
    ->        '%2d','-'),  -- ourselves
    ->        '%2B','+'),
    ->        '%2b','+'),  -- sometimes
    ->        '%25','%') as replaced;
+--------------------------+----------+
| original                 | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 |  '|&^+-% |
+--------------------------+----------+
1 row in set (0.01 sec)

To make our lives easier, let’s create a function from that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> CREATE FUNCTION url_replace
    -> (url VARCHAR(1024))
    -> RETURNS VARCHAR(1025)
    -> BEGIN
    -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),
    ->        '%7c','|'),
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),
    ->        '%2D','-'),
    ->        '%2d','-'),
    ->        '%2B','+'),
    ->        '%2b','+'),
    ->        '%25','%');
    -> END$$
Query OK, 0 rows affected (0.02 sec)

And to wrap it up, let’s use it in an actual query. This query finds all instances of a % and replaces it with its unencoded character equivalent.

1
mysql> UPDATE website_page SET uri = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( uri,'%20',' '), '%27','\''), '%7C','|'), '%7c','|'), '%26','&'), '%5E','^'), '%5e','^'), '%2D','-'), '%2d','-'), '%2B','+'), '%2b','+'), '%25','%'), '%2C',',') WHERE uri LIKE '%\%%';

MySQL Proxy Query Profiling

Friday, December 7th, 2007

Since I am now finally getting to play with MySQL Proxy, I am going to outline some recipes here that I have found/created/modified that may be useful to someone other than me. This is a recipe for profiling queries. It writes the information to the PROXY_LOG_FILE currently name mysql.log. It is a file that will be created in the directory that you run mysql-proxy from. The file itself is mostly commented and should therefore be pretty self-explanatory. It was adapted from the reference documentation listed at the bottom of this entry.

assert(proxy.PROXY_VERSION >= 0x00600,
  "You need to be running mysql-proxy 0.6.0 or greater")

 -- Set up the log file
 local log_file = os.getenv("PROXY_LOG_FILE")
 if (log_file == nil) then
    log_file = "mysql.log"
 end

 -- Open up our log file in append mode
 local fh = io.open(log_file, "a+")

 -- Set some variables
 local original_query = ""
 local executed_query = ""
 local replace = false
 local comment = ""

 -- For profilign
 local profile = 0

-- Push the query onto the Queue
function read_query( packet )
  if string.byte(packet) == proxy.COM_QUERY then
    query = string.sub(packet, 2)

    -- Pull out the comment and store it
    original_query = string.gsub(query, "%s*%*%*%*%s(.+)%s%*%*%*%s*",'')
    comment = string.match(query, "%s*%*%*%*%s(.+)%s%*%*%*%s*")

    -- Add the original packet to the query if we have a comment
    if (comment) then
        if string.match(string.upper(comment), '%s*PROFILE') then
          -- Profile types:
          --   ALL, BLOCK IO, CONTEXT SWITCHES, IPC,
          --        MEMORY, PAGE FAULTS, SOURCE, SWAPS

          -- Set it up for profiling
          -- Generate a new COM_QUERY packet
          --   and inject it with a new id (11)
          original_query = "SET PROFILING=1; " .. original_query
      end -- End if (PROFILE)
    end -- End if (comment)

    executed_query = original_query
    -- Check for a 'SELECT' typo
    if string.match(string.upper(original_query), '^%s*SLECT') then
        executed_query = string.gsub(original_query,'^%s*%w+', 'SELECT')
        replace = true;
    -- matches "CD" as first word of the query
    elseif string.match(string.upper(original_query), '^%s*CD') then
        executed_query = string.gsub(original_query,'^%s*%w+', 'USE')
        replace = true
    end

    -- Postpend the other profiling strings to the query
    if (comment and string.match(string.upper(comment), '%s*PROFILE')) then
        executed_query = executed_query .. "; SHOW " .. comment
    end

    -- Actually execute the query here
    proxy.queries:append(1, string.char(proxy.COM_QUERY) .. executed_query )
    return proxy.PROXY_SEND_QUERY
  else
    executed_query = ""
  end
end

function read_query_result (inj)
  local res = assert(inj.resultset)
  local num_cols = string.byte(res.raw, 1)
  if num_cols > 0 and num_cols < 255 then
     for row in inj.resultset.rows do
         row_count = row_count + 1
     end
  end
  
  -- Prepend the error tag in the log
  local error_status = "" 
  if res.query_status and (res.query_status < 0) then
     error_status = "[ERROR]"
  end
  
  -- Gets the rows affected by the actual query
  local row_count = 0
  if (res.affected_rows) then
     row_count = res.affected_rows
  end
  
  -- Prepend the comment line in the log
  if (comment) then
     fh:write( string.format("%s %6d -- [COMMENT] %s\n",
        os.date('%Y-%m-%d %H:%M:%S'), 
        proxy.connection.server["thread_id"],
        comment))
  end
  
  -- Prepend the typo in the log
  if (replace) then
     fh:write( string.format("%s %6d -- [REPLACEMENT] %s\n\t\t\t%s\n",
        os.date('%Y-%m-%d %H:%M:%S'), 
        proxy.connection.server["thread_id"],
        ("replaced " .. original_query),
        ("with " .. executed_query)))
  end
  
  -- Write the query adding the number of rows retrieved and query time
  fh:write( string.format("%s %6d -- %s %s {%d rows} {%d ms}\n",
     os.date('%Y-%m-%d %H:%M:%S'), 
     proxy.connection.server["thread_id"],
     error_status,
     executed_query,
     row_count,
     inj.query_time)
  )
  fh:flush()
end

To make this work, simply append 3 asterisks 'PROFILE <profile_type>' and then 3 more asterisks and you will have the profile information returned to you in your query:

1
*** PROFILE ALL *** SELECT * FROM foo_bar;

2 tables will be returned. Your results and then the profile of your results.

Reference: Writing LUA Scripts for MySQL Proxy