Archive for the ‘ Databases ’ Category

Range Repairs: Step-by-Step

Friday, May 9th, 2014

It’s been a long time since I was able to run a repair on my Cassandra cluster. Basically since I went to 1.2, it just hasn’t been possible. And since repairs in Cassandra are pretty much a requirement to normal operation, this is clearly a problem. So in order to deal with the disarray that is Cassandra repairs in 1.2, I found a script originally written by Matt Stump and edited to work with virtual nodes (vnodes) by Brian Gallew. The tl;dr is that the script breaks the repairs down into manageable chunks and allows the repairs to finish. It is available here. (more…)

Redis Setup Notes and One-Liners

Tuesday, October 22nd, 2013

Being a heavy user of Redis has forced some weird Bash-fu and other commands when I want to find out how things are going. Because Redis is single threaded (see here for more information), I commonly run multiple Redis instances per machine. As a result, when running on AWS, I use a specific machine layout to get the best CPU utilization for Redis. On an m2.4xlarge machine, it comes with 8 cores and 68G of RAM. To take full advantage of that I run 7 instances of Redis and pin one instance to a CPU core (this can be done using taskset in schedutils package). For extra performance, I leave an entire core to the OS (even though the machines do little other than process Redis commands. (more…)

Cassandra Summit 2012 Highlights

Monday, August 27th, 2012

I was lucky enough to have the opportunity to speak at the Cassandra World Summit 2012 on August 8 in Santa Clara. It was an amazing opportunity to share with the community the types of things that SimpleReach does with Cassandra. Not only that, I learned a lot about the roadmap and got to put a bunch of faces with the names behind the project.
(more…)

What’s So Great About Cassandra’s Composite Columns?

Tuesday, August 7th, 2012

There are a lot of things I really like about Cassandra. But one thing in particular I like in creating a schema is having access to composite columns (read about composite columns and their origins here on Datastax’s blog). Let’s start simple with explaining a composite columns and then we can dive right into why they are so much fun to work with. (more…)

ec2-consistent-snapshot With Mongo

Thursday, April 21st, 2011

I setup MongoDB on my Amazon EC2 instance knowing full well that it would have to be backed up at some point. I also knew that by using XFS, I could take advantage of filesystem freezing in a similar fashion to LVM snapshots. I had remembered reading about backups on XFS with MySQL being done with ec2-consistent-snapshot. As with any piece of open source software, it just took a little tweaking to make it do what I wanted it to do.
(more…)

Getting a Random Record From a MongoDB Collection

Monday, August 9th, 2010

One of my issues with MongoDB is that, as of this writing, there is no way to retrieve a random record. In SQL, you can simply do something similar to “ORDER BY 1” (this varies depending on your flavor) and you can retrieve random records (at a slightly expensive query cost). There is not yet an equivalent in MongoDB because of its sequential access nature. There is a purely Javascript method in the MongoDB cookbook here. If you are really interested, I would also read the Jira ticket thread #533 on this issue.
(more…)

New Massachusetts Security Law Passed For Databases

Tuesday, April 27th, 2010

In case you haven’t heard about the new Massachusetts state law regarding consumer or client information in databases, you can read about it here, at Information Week, or just Google for “Massachusetts data security law”. And if you haven’t read about, then I strongly suggest you do. This is one of those instances where I believe their heart is in the right place, even if the execution/implementation wasn’t perfect.
(more…)

Speeding Up Your Selects and Sorts

Thursday, March 4th, 2010

When you are using a framework, they typically set your VARCHAR size automatically to 255. This is normally fine since you are letting the framework abstract you away from most of the SQL. But if you interact with your SQL, there is a way to get a decent speed increase on your SELECTs and ORDER BYs when you are working with VARCHARs.

The VARCHAR data type is only variable character size for storage, not for sorting and buffering. In fact, since the MySQL optimizer doesn’t know how big the data in that column can be, it has to allocate the maximum size possible for that column. So for sorting and buffering of the name and email columns below would take up 310 bytes per row.
(more…)

MySQL Error 1033: Incorrect Information in File

Tuesday, January 5th, 2010

If you’ve ever been plagued by an error 1033 issue in MySQL (replication will show it as well), then I might be able to help you out. The error reads something like, “Incorrect information in file: ‘./mydb/table.frm’. I classify this as another one of MySQLs cryptic error messages. Here is how I determined that this was my problem.

Googling around got me an answer, but I had to read a bunch of different responses to piece together the answer. Essentially this issue (in my case) was a result of the InnoDB engine not loading up when MySQL was restarted. Therefore when MySQL tried to read the frm file (table description) which was written for an InnoDB table with the MyISAM reader, it didn’t like it. Since MyISAM is the fallback engine, it went to that and the table became unusable.
(more…)

First Experience With Cassandra

Monday, October 19th, 2009

I recently posted about my initial experience with Tokyo Cabinet. Now it’s time to get to work on Cassandra.

Cassandra is the production database that’s in use on Facebook for handling their email system and Digg.

One thing that I would like to note is that when I tested TC, I used the Perl API for both TC and TT. I tried both the Perl API and the Ruby API. I couldn’t get the Ruby API (written by Evan Weaver of Twitter) it to work at all with the Cassandra server (although I am sure the server included with the gem works well). I initially struggled quite a bit with the UUID aspects of the Perl API until I finally gave up and changed the ColumnFamily CompareWith type from

1
<columnfamily CompareWith="TimeUUIDType" Name="Users" />

to

1
<columnfamily CompareWith="BytesType" Name="Users" />

Then everything was working well and I began to write my tests. The layout that I ended up using is going to be one that works in a schemaless fashion. I created 2 consistent columns per user: email and person_id. Here is where it gets interesting and different for those of us used to RDBMS’s and having less columns. For this particular project, every time a user is sent an email, there is a “row” (I call it a row for those unfamiliar with Cassandra terminology, it is actually a column) added in the format of: send_dates_<date> (note the structure below). The value of this column is the mailing campaign id sent to the user on this date. This means that if the user receives 365 emails per year at one a day, then there will be 365 rows (or Cassandra columns) that start with send_dates_ and end with YYYY-MM-DD. Note the data structure below in a JSON ish format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Users = {
    'foo@example.com': {
        email: 'foo@example.com',
        person_id: '123456',
        send_dates_2009-09-30: '2245',
        send_dates_2009-10-01: '2247',
    },
    'bar@baz.com': {
        email: 'bar@baz.com',
        person_id: '789',
        send_dates_2009-09-30: '2245',
        send_dates_2009-10-01: '2246',
    },
}

To understand all the data structures in Cassandra better, I strongly recommend reading WTF Is A SuperColumn Cassandra Data Model and Up And Running With Cassandra. They are written by folks at Digg and Twitter respectively and are well worth the reads.

So for my first iteration, I simply loaded up the data in the format mentioned above. Every insert does an insert of an email and person_id just in case they aren’t there to begin with. The initial data set has approximately 3.6 million records. This caused all sorts of problems with the default configurations (ie kept crashing on me). The changes I made to the default configuration are as follows:

  • Change the maximum file descriptors from 1024 (system default) to 65535 (or unlimited)
  • Change the default minimum and maximum Java -Xms256M -Xmx2G (could not get the data to load past 2.5 million records without upping max memory values)
1
2
3
4
5
6
7
8
9
10
11
12
[elubow@db5 db]$ time ./cas_load.pl -D 2009-09-30 -c queue-mail.ini -b lists/
usa: 99,272
top: 3,661,491
Total: 3,760,763

real    72m50.826s
user    29m57.954s
sys     2m18.816s
[elubow@db5 cassandra]# du -sh data/Mailings/ # Prior to data compaction
13G     data/Mailings/
[elubow@db5 cassandra]# du -sh data/Mailings/ # Post data compaction
1.4G    data/Mailings/

It was interesting to note that the write latency of about 3.6 million records was 0.004 ms. Also the data compaction brought the size of the records on disk down from 13G to 1.4G. Those figures are being achieved with the reads and writes happening on the same machines.

The load of the second data set took a mere 30m when compared to loading that same data set into Tokyo Cabinet which took closer to 180m.

1
2
3
4
5
6
7
8
9
10
luxe: 936,911
amex: 599,981
mex: 39,700
Total: 1,576,592

real    30m53.109s
user    12m53.507s
sys     0m59.363s
[elubow@db5 cassandra]# du -sh data/Mailings/
2.4G    data/Mailings/

Now that there is a dataset worth working with, it’s time to start the read tests.

For the first test, I am doing a simple get of the email column. This is just to iterate over the column and find out the approximate speed of the read operations.

1
2
3
4
Run 1: 134m59.923s
Run 2; 125m55.673s
Run 3: 127m21.342s
Run 4: 119m2.414s

For the second test, I made use of a Cassandra feature called get_slice. Since I have columns that are in the format send_dates_YYYY-MM-DD, I used get_slice to grab all column names on a per-row (each email address) basis that were between send_dates_2009-09-29 and send_dates_2009-10-29. The maximum amount that could be matched were 2 (since I only loaded 2 days worth of data into the data base). I used data from a 3rd day so I can get 0, 1, or 2 as results.

This first run is using the Perl version of the script.

1
2
3
4
5
6
7
8
9
10
Email Count: 3557584
Match 0: 4,247
Match 1: 1,993,273
Match 2: 1,560,064

real    177m23.000s
user    45m21.859s
sys     9m17.516s

Run 2: 151m27.042s

Subsequent runs I began to run into API issues and rewrote the same script in Python to see if the more well tested Thrift Python API was faster than the Thrift Perl API (and wouldn’t give me timeout issues). The Perl timeout issues ended up being fixable, but I continued with the tests in Python.

1
2
3
4
5
6
7
8
9
10
11
[elubow@db5 db]$ time python cas_get_slice.py
{0: 4170, 1: 1935783, 2: 1560042}
Total: 3557584

real    213m57.854s
user    14m57.768s
sys     0m51.634s

Run 2: 132m27.930s
Run 3: 156m19.906s
Run 4: 127m34.715s

Ultimately with Cassandra, there was quite a bit of a learning curve. But in my opinion is well worth it. Cassandra is an extremely powerful database system that I plan on continuing to explore in greater detail with a few more in depth tests. If you have the chance, take a look at Cassandra.

Tokyo Tyrant and Tokyo Cabinet

Friday, October 9th, 2009

Tokyo Tyrant and Tokyo Cabinet are the components for a database used by Mixi (basically a Japanese Facebook). And for work, I got to play with these tools for some research. Installing all this stuff along with the Perl APIs is incredibly easy.

Ultimately I am working on a comparison of Cassandra and Tokyo Cabinet, but I will get to more on Cassandra later.

Ideally the tests I am going to be doing are fairly simple. I am going to be loading a few million rows into a TCT database (which is a table database in TC terms) and then loading key, value pairs into the database. The layout in a hash format is basically going to be as follows:

1
2
3
4
{
      "user@example.com" => {   "sendDates" => {"2009-09-30"},   },
      "123456789" => {  "2009-09-30" => {"2287"}   },
}

I ran these tests in the following formats for INSERTing the data into the a table database and as serialized data in a hash database. It is necessary to point out that the load on this machine is the normal load. Therefore it cannot be a true benchmark. Since the conditions are not optimal (but really, when are they ever), take the results with a grain of salt. Also, there is some data munging going on during every iteration to grab the email addresses and other data. All this is being done through the Perl API and Tokyo Tyrant. The machine that this is running on is a Dual Dual Core 2.5GHz Intel Xeon processor with 16G of memory.

For the first round, a few things should be noted:

  • The totals referenced below are email address counts add/modified in the db
  • I am only using 1 connection to the Tokyo Tyrant DB and it is currently setup to handle 8 threads
  • I didn’t do any memory adjustment on startup, so the default (which is marginal) is in use
  • I am only using the standard put operations, not putcat, putkeep, or putnr (which I will be using later)

The results of the table database are as follows. It is also worth noting the size of the table is around 410M on disk.

1
2
3
4
5
6
7
8
9
10
[elubow@db5 db]$ time ./tct_test.pl -b lists/ -D 2009-09-30 -c queue-mail.ini
usa: 99,272
top: 3,661,491
Total: 3,760,763

real    291m53.204s
user    4m53.557s
sys     2m35.604s
[root@db5 tmp]# ls -l
-rw-r--r-- 1 root root 410798800 Oct  6 23:15 mailings.tct

The structure for the hash database (seeing as its only key value) is as follows:

1
2
      "user@example.com" => "2009-09-30",
      "123456789" => "2009-09-30|2287",

The results of loading the same data into a hash database are as follows. It is also worth noting the size of the table is around 360M on disk. This is significantly smaller than the 410M of the table database containing the same style data.

1
2
3
4
5
6
7
8
9
10
[elubow@db5 db]$ time ./tch_test.pl -b lists/ -D 2009-09-30 -c queue-mail.ini
usa: 99,272
top: 3,661,491
Total: 3,760,763

real    345m29.444s
user    2m23.338s
sys     2m15.768s
[root@db5 tmp]# ls -l
-rw-r--r-- 1 root root 359468816 Oct  7 17:50 mailings.tch

For the second round, I loaded a second days worth of data in to the database. I used the same layouts as above with the following noteworthy items:

  • I did a get first prior to the put to decide whether to use put or putcat
  • The new data structure is now either “2009-09-30,2009-10-01” or “2009-09-30|1995,2009-10-01|1996”

Results of the hash database test round 2:

1
2
3
4
5
6
7
8
9
10
11
[elubow@db5 db]$ time ./tch_test.pl -b lists/ -D 2009-10-01 -c queue-mail.ini
luxe: 936,911
amex: 599,981
mex: 39,700
Total: 1,576,592

real    177m55.280s
user    1m53.289s
sys     2m8.606s
[elubow@db5 db]$ ls -l
-rw-r--r-- 1 root root 461176784 Oct  7 23:44 mailings.tch

Results of the table database test round 2:

1
2
3
4
5
6
7
8
9
10
11
[elubow@db5 db]$ time ./tct_test.pl -b lists/ -D 2009-10-01 -c queue-mail.ini
luxe: 936,911
amex: 599,981
mex: 39,700
Total: 1,576,592

real    412m19.007s
user    4m39.064s
sys     2m22.343s
[elubow@db5 db]$ ls -l
-rw-r--r-- 1 root root 512258816 Oct  8 12:41 mailings.tct

When it comes down to the final implementation, I will likely be parallelizing the put in some form. I would like to think that a database designed for this sort of thing works best in a concurrent environment (especially considering the default startup value is 8 threads).

It is obvious that when it comes to load times, that the hash database is much faster. Now its time to run some queries and see how this stuff goes down.

So I ran some queries first against the table database. I grabbed a new list of 3.6 million email addresses and iterated over the list, grabbed the record from the table database and counted how many dates (via array value counts) were entered for that email address. I ran the script 4 times and results were as follows. I typically throw out the first run since caching kicks in for the other runs.

1
2
3
4
Run 1: 10m35.689s
Run 2: 5m41.896s
Run 3: 5m44.505s
Run 4: 5m44.329s

Doing the same thing for the hash database, I got the following result set:

1
2
3
4
Run 1: 7m54.292s
Run 2: 4m13.467s
Run 3: 3m59.302s
Run 4: 4m13.277s

I think the results speak for themselves. A hash database is obviously faster (which is something most of us assumed from the beginning). The rest of time comes form programmatic comparisons like date comparisons in specific slices of the array. Load times can be sped up using concurrency, but given the requirements of the project, the gets have to be done in this sequential fashion.

Now its on to testing Cassandra in a similar fashion for comparison.

Percona Conference Slides

Saturday, May 9th, 2009

For everyone who was not able to attend this year, here is the link to the Percona Perfmance Conference slides: http://conferences.percona.com/percona-performance-conference-2009/schedule.html.

I suggest you check them out if you work with MySQL. Enjoy.

Counting Email Addresses By Domain in MySQL

Wednesday, April 1st, 2009

Every so often I find some statistical need that although Perl program would be easy to write for it, its probably something the database should just handle. So I have a column in an email management table that has just the email addresses in the format user@domain.tld.

I want to know which domains make up the majority of the users. (The numbers have been changed to protect the innocent):

mysql> SELECT SUBSTRING_INDEX(email, '@', -1) as Domain, count(*) as Total
      FROM email_list
GROUP BY Domain
ORDER BY Total DESC
       LIMIT 15;
+----------------+---------+
| Domain         | Total   |
+----------------+---------+
| yahoo.com      | 1304000 | 
| hotmail.com    |  908400 | 
| aol.com        |  800000 | 
| msn.com        |  168000 | 
| gmail.com      |  161000 | 
| comcast.net    |  143000 | 
| sbcglobal.net  |  110000 | 
| bellsouth.net  |   62000 | 
| cox.net        |   58000 | 
| verizon.net    |   56000 | 
| earthlink.net  |   52000 | 
| charter.net    |   46000 | 
| juno.com       |   30000 | 
| optonline.net  |   22000 | 
| netzero.com    |   17000 | 
+----------------+---------+

Character Encoding

Thursday, October 23rd, 2008

I recently ran into some character encoding issues and I wanted to share the fun. The default character encoding for MySQL on Gentoo is latin-1 or iso-8859-1. This wasn’t a problem until we recently started putting content straight from the DB through Java and onto the web. Java connects to the DB with a character encoding (typically UTF-8). Since UTF-8 is roughly a superset of iso-8859-1, it generally wasn’t a problem. Except when UTF-8 and UTF-16 characters were put into an iso-8859-1 database without translation.

What was essentially happening was that the data was being stored as iso-8859-1. The Java code was connecting to the DB in UTF-8 and pulling it into Java (which is usually UTF-16, but in this case was being handled as UTF-8). It was then being sent to the browser as URL encoded UTF-8 when in reality, it hadn’t even properly been put into UTF-8 character encoding. This then gave the web browser some funny yen symbols and question marks. This was not quite what we were aiming for.

The moral of this story is that it is necessary to realize the character encoding of the start point and end point of your data. It is crucial that the code points match up otherwise they could potentially make for an interesting screen given to the reader. All this could have been avoided with a simple: ALTER TABLE myTable MODIFY myColumn VARCHAR(255) CHARACTER SET utf8;.

MySQL Encoded URI Search and Replace

Friday, September 26th, 2008

So I can definitely not take credit for this trick. The credit goes to Augusto Bott of the Pythian Group for this. I have a table in my DB that has columns of encoded URIs. The list of URI encoding character translations is available here.

To accomplish this, the MySQL REPLACE function was used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> SET @test:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @test as original,
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),  -- REPLACE() is case sensitive
    ->        '%7c','|'),  -- so we have
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),  -- to repeat
    ->        '%2D','-'),
    ->        '%2d','-'),  -- ourselves
    ->        '%2B','+'),
    ->        '%2b','+'),  -- sometimes
    ->        '%25','%') as replaced;
+--------------------------+----------+
| original                 | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 |  '|&^+-% |
+--------------------------+----------+
1 row in set (0.01 sec)

To make our lives easier, let’s create a function from that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> CREATE FUNCTION url_replace
    -> (url VARCHAR(1024))
    -> RETURNS VARCHAR(1025)
    -> BEGIN
    -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),
    ->        '%7c','|'),
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),
    ->        '%2D','-'),
    ->        '%2d','-'),
    ->        '%2B','+'),
    ->        '%2b','+'),
    ->        '%25','%');
    -> END$$
Query OK, 0 rows affected (0.02 sec)

And to wrap it up, let’s use it in an actual query. This query finds all instances of a % and replaces it with its unencoded character equivalent.

1
mysql> UPDATE website_page SET uri = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( uri,'%20',' '), '%27','\''), '%7C','|'), '%7c','|'), '%26','&'), '%5E','^'), '%5e','^'), '%2D','-'), '%2d','-'), '%2B','+'), '%2b','+'), '%25','%'), '%2C',',') WHERE uri LIKE '%\%%';

MySQL Proxy Query Profiling

Friday, December 7th, 2007

Since I am now finally getting to play with MySQL Proxy, I am going to outline some recipes here that I have found/created/modified that may be useful to someone other than me. This is a recipe for profiling queries. It writes the information to the PROXY_LOG_FILE currently name mysql.log. It is a file that will be created in the directory that you run mysql-proxy from. The file itself is mostly commented and should therefore be pretty self-explanatory. It was adapted from the reference documentation listed at the bottom of this entry.

assert(proxy.PROXY_VERSION >= 0x00600,
  "You need to be running mysql-proxy 0.6.0 or greater")

 -- Set up the log file
 local log_file = os.getenv("PROXY_LOG_FILE")
 if (log_file == nil) then
    log_file = "mysql.log"
 end

 -- Open up our log file in append mode
 local fh = io.open(log_file, "a+")

 -- Set some variables
 local original_query = ""
 local executed_query = ""
 local replace = false
 local comment = ""

 -- For profilign
 local profile = 0

-- Push the query onto the Queue
function read_query( packet )
  if string.byte(packet) == proxy.COM_QUERY then
    query = string.sub(packet, 2)

    -- Pull out the comment and store it
    original_query = string.gsub(query, "%s*%*%*%*%s(.+)%s%*%*%*%s*",'')
    comment = string.match(query, "%s*%*%*%*%s(.+)%s%*%*%*%s*")

    -- Add the original packet to the query if we have a comment
    if (comment) then
        if string.match(string.upper(comment), '%s*PROFILE') then
          -- Profile types:
          --   ALL, BLOCK IO, CONTEXT SWITCHES, IPC,
          --        MEMORY, PAGE FAULTS, SOURCE, SWAPS

          -- Set it up for profiling
          -- Generate a new COM_QUERY packet
          --   and inject it with a new id (11)
          original_query = "SET PROFILING=1; " .. original_query
      end -- End if (PROFILE)
    end -- End if (comment)

    executed_query = original_query
    -- Check for a 'SELECT' typo
    if string.match(string.upper(original_query), '^%s*SLECT') then
        executed_query = string.gsub(original_query,'^%s*%w+', 'SELECT')
        replace = true;
    -- matches "CD" as first word of the query
    elseif string.match(string.upper(original_query), '^%s*CD') then
        executed_query = string.gsub(original_query,'^%s*%w+', 'USE')
        replace = true
    end

    -- Postpend the other profiling strings to the query
    if (comment and string.match(string.upper(comment), '%s*PROFILE')) then
        executed_query = executed_query .. "; SHOW " .. comment
    end

    -- Actually execute the query here
    proxy.queries:append(1, string.char(proxy.COM_QUERY) .. executed_query )
    return proxy.PROXY_SEND_QUERY
  else
    executed_query = ""
  end
end

function read_query_result (inj)
  local res = assert(inj.resultset)
  local num_cols = string.byte(res.raw, 1)
  if num_cols > 0 and num_cols < 255 then
     for row in inj.resultset.rows do
         row_count = row_count + 1
     end
  end
  
  -- Prepend the error tag in the log
  local error_status = "" 
  if res.query_status and (res.query_status < 0) then
     error_status = "[ERROR]"
  end
  
  -- Gets the rows affected by the actual query
  local row_count = 0
  if (res.affected_rows) then
     row_count = res.affected_rows
  end
  
  -- Prepend the comment line in the log
  if (comment) then
     fh:write( string.format("%s %6d -- [COMMENT] %s\n",
        os.date('%Y-%m-%d %H:%M:%S'), 
        proxy.connection.server["thread_id"],
        comment))
  end
  
  -- Prepend the typo in the log
  if (replace) then
     fh:write( string.format("%s %6d -- [REPLACEMENT] %s\n\t\t\t%s\n",
        os.date('%Y-%m-%d %H:%M:%S'), 
        proxy.connection.server["thread_id"],
        ("replaced " .. original_query),
        ("with " .. executed_query)))
  end
  
  -- Write the query adding the number of rows retrieved and query time
  fh:write( string.format("%s %6d -- %s %s {%d rows} {%d ms}\n",
     os.date('%Y-%m-%d %H:%M:%S'), 
     proxy.connection.server["thread_id"],
     error_status,
     executed_query,
     row_count,
     inj.query_time)
  )
  fh:flush()
end

To make this work, simply append 3 asterisks 'PROFILE <profile_type>' and then 3 more asterisks and you will have the profile information returned to you in your query:

1
*** PROFILE ALL *** SELECT * FROM foo_bar;

2 tables will be returned. Your results and then the profile of your results.

Reference: Writing LUA Scripts for MySQL Proxy