<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Erics Tech Blog &#187; Databases</title>
	<atom:link href="http://eric.lubow.org/category/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://eric.lubow.org</link>
	<description>Thoughts, musings, and other idealistic (sometimes useful) systems and development hoopla.</description>
	<lastBuildDate>Fri, 18 Nov 2011 14:56:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>ec2-consistent-snapshot With Mongo</title>
		<link>http://eric.lubow.org/2011/databases/mongodb/ec2-consistent-snapshot-with-mongo/</link>
		<comments>http://eric.lubow.org/2011/databases/mongodb/ec2-consistent-snapshot-with-mongo/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 07:00:47 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=863</guid>
		<description><![CDATA[I setup MongoDB on my Amazon EC2 instance knowing full well that it would have to be backed up at some point. I also knew that by using XFS, I could take advantage of filesystem freezing in a similar fashion to LVM snapshots. I had remembered reading about backups on XFS with MySQL being done [...]]]></description>
			<content:encoded><![CDATA[<p>I setup <a href="http://www.mongodb.org/">MongoDB</a> on my Amazon EC2 instance knowing full well that it would have to be backed up at some point.  I also knew that by using XFS, I could take advantage of filesystem freezing in a similar fashion to LVM snapshots.  I had remembered reading about backups on XFS with MySQL being done with <a href="http://alestic.com/2009/09/ec2-consistent-snapshot">ec2-consistent-snapshot</a>.  As with any piece of open source software, it just took a little tweaking to make it do what I wanted it to do.<br />
<span id="more-863"></span><br />
Out of the box, ec2-consistent-snapshot works great for freezing an XFS filesystem with MySQL because it not only stops the server, but handles potential replication issues.  By following the steps outlined <a href="http://www.mongodb.org/pages/viewpage.action?pageId=19562846">here</a> by 10gen, I just made  a few slight adjustments to the core ec2-consistent snapshot script to allow for MongoDB support.  In fact, it supports locking and fsyncing immediately prior to freezing and backup.  I have been using this script in production for a while now and it seems to work without issue for me.</p>
<p>In the usual spirit of social coding, I have added the script to Github: <a href="https://github.com/elubow/ec2-consistent-snapshot">https://github.com/elubow/ec2-consistent-snapshot</a>.</p>
<p>Running it is just this:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">ec2-consistent-snapshot &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\<br />
<span style="color: #660033;">--mongo</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\<br />
<span style="color: #660033;">--xfs-filesystem</span> <span style="color: #000000; font-weight: bold;">/</span>data &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \<br />
<span style="color: #660033;">--region</span> us-east-<span style="color: #000000;">1</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \<br />
<span style="color: #660033;">--description</span> <span style="color: #ff0000;">&quot;RAID snapshot <span style="color: #007800;">$(date +'%Y-%m-%d %H:%M:%S')</span>&quot;</span> \<br />
vol-VOL1 vol-VOL2 vol-VOL3 vol-VOL4 vol-VOL5 vol-VOL6 vol-VOL7 vol-VOL8</div></div>
<p>The options used here (for reference) are telling ec2-consistent-snapshot to use <em>&#8211;mongo</em>, on the <em>&#8211;xfs-filesystem</em> /data, in the us-east-1 <em>&#8211;region</em> (note that it&#8217;s just the region and not the availability zone within that region), to be backed up with the listed <em>&#8211;description</em> of the specified volumes.  You can even throw a <em>&#8211;mongo-stop</em> in there to have Mongo stopped before the file system freeze and then restarted after the volumes have been backed up.  Don&#8217;t forget that you need to set your Amazon keys in you environment variables (AMAZON_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for your key and secret respectively).</p>
<p>I attempted to keep the usage style consistent with Eric Hammond&#8217;s original version, just add Mongo support for it.</p>
<p><strong>Note:</strong> I also mentioned this on the <a href="http://groups.google.com/group/mongodb-user/browse_thread/thread/633c3fbc648861a1?pli=1">mailing list</a>.  But given the amount of messages that fly around on the list daily, some folks may have missed it.</p>
<p><strong>References:</strong></p>
<ul>
<li><a href="http://alestic.com/2009/09/ec2-consistent-snapshot">ec2-consistent-snapshot</a> blog entry by Eric Hammond</li>
<li><a href="https://github.com/elubow/ec2-consistent-snapshot">ec2-consistent-snapshot</a> on Github with Mongo DB support</li>
<li><a href="http://www.mongodb.org/pages/viewpage.action?pageId=19562846">Backing up MongoDB on EC2 (10gen)</a></li>
</ul>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2010/databases/mongodb/getting-a-random-record-from-a-mongodb-collection/' rel='bookmark' title='Getting a Random Record From a MongoDB Collection'>Getting a Random Record From a MongoDB Collection</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2011/databases/mongodb/ec2-consistent-snapshot-with-mongo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting a Random Record From a MongoDB Collection</title>
		<link>http://eric.lubow.org/2010/databases/mongodb/getting-a-random-record-from-a-mongodb-collection/</link>
		<comments>http://eric.lubow.org/2010/databases/mongodb/getting-a-random-record-from-a-mongodb-collection/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 11:15:10 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=767</guid>
		<description><![CDATA[One of my issues with MongoDB is that, as of this writing, there is no way to retrieve a random record. In SQL, you can simply do something similar to &#8220;ORDER BY RAND()&#8221; (this varies depending on your flavor) and you can retrieve random records (at a slightly expensive query cost). There is not yet [...]]]></description>
			<content:encoded><![CDATA[<p>One of my issues with <a href="http://www.mongodb.org/">MongoDB</a> is that, as of this writing, there is no way to retrieve a random record.  In SQL, you can simply do something similar to &#8220;ORDER BY RAND()&#8221; (this varies depending on your flavor) and you can retrieve random records (at a slightly expensive query cost).  There is not yet an equivalent in MongoDB because of its sequential access nature.  There is a purely Javascript method in the MongoDB cookbook <a href="http://cookbook.mongodb.org/patterns/random-attribute/">here</a>.  If you are really interested, I would also read the Jira ticket thread <a href="http://jira.mongodb.org/browse/SERVER-533">#533</a> on this issue.<br />
<span id="more-767"></span><br />
Although it feels a little dirty and kind of hackish, here is how I accomplished getting a random record using the <a href="http://github.com/mongodb/mongo-ruby-driver">Mongo-Ruby driver</a>.  Part of this is documented in the cookbook article I linked to above, but I reiterate bits and pieces of it here.  This is essentially the same thing that any &#8220;ORDER BY RAND()&#8221; statement is doing, its just not doing it &#8220;on the fly&#8221;.</p>
<p>The first thing you&#8217;ll have to do is add an additional column to the collection; we&#8217;ll call it <em>random</em>.  For the ease of use, we&#8217;ll also say that every value that goes in this column is between 0 and 1 (and can therefore be generated via <em>Kernel.rand()</em>).  This is important because we are going to use it as our criteria for finding a random record.</p>
<p>First, initialized the connection to the database and bind an instance variable to a collection.  Then generate the random number that you are going to use to find a random record.  Now we try to <strong>find_one</strong> document that is greater than or equal to our random number.  In case we miss, we also do a less than or equal to next.  This means that as long as we have at least 1 document in our collection, we will return a record.  The more documents in the collection, the better the randomness of the returned document.</p>
<div class="codecolorer-container ruby default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="ruby codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">@@mongodb = <span style="color:#6666ff; font-weight:bold;">Mongo::Connection</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;localhost&quot;</span>, <span style="color:#006666;">27017</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">db</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;test_db}&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
<span style="color:#0066ff; font-weight:bold;">@collection</span> = @@mongodb<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;collection_name&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span><br />
<br />
@<span style="color:#CC0066; font-weight:bold;">rand</span> = <span style="color:#CC00FF; font-weight:bold;">Kernel</span>.<span style="color:#CC0066; font-weight:bold;">rand</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
<span style="color:#0066ff; font-weight:bold;">@random_record</span> = <span style="color:#0066ff; font-weight:bold;">@collection</span>.<span style="color:#9900CC;">find_one</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">'random'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">'$gte'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> @<span style="color:#CC0066; font-weight:bold;">rand</span> <span style="color:#006600; font-weight:bold;">&#125;</span> <span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
<span style="color:#9966CC; font-weight:bold;">if</span> <span style="color:#0066ff; font-weight:bold;">@random_record</span>.<span style="color:#0000FF; font-weight:bold;">nil</span>?<br />
&nbsp; <span style="color:#0066ff; font-weight:bold;">@random_record</span> = <span style="color:#0066ff; font-weight:bold;">@collection</span>.<span style="color:#9900CC;">find_one</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">'random'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">'$lte'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> @<span style="color:#CC0066; font-weight:bold;">rand</span> <span style="color:#006600; font-weight:bold;">&#125;</span> <span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span><br />
<span style="color:#9966CC; font-weight:bold;">end</span></div></div>
<p>For reference, a mongodb collection with a random column may look like this:</p>
<div class="codecolorer-container javascript default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #3366CC;">&quot;_id&quot;</span> <span style="color: #339933;">:</span> ObjectId<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;4c5c710e41b89d657d000001&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #3366CC;">&quot;url&quot;</span> <span style="color: #339933;">:</span> <span style="color: #3366CC;">&quot;http://www.example.com/&quot;</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #3366CC;">&quot;created_at&quot;</span> <span style="color: #339933;">:</span> <span style="color: #3366CC;">&quot;Fri Aug 06 2010 16:32:15 GMT-0400 (EDT)&quot;</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #3366CC;">&quot;random&quot;</span> <span style="color: #339933;">:</span> <span style="color: #CC0000;">0.45929463868260356</span><br />
<span style="color: #009900;">&#125;</span></div></div>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2011/databases/mongodb/ec2-consistent-snapshot-with-mongo/' rel='bookmark' title='ec2-consistent-snapshot With Mongo'>ec2-consistent-snapshot With Mongo</a></li>
<li><a href='http://eric.lubow.org/2010/tips/random-tech-notes-and-buzz-updates/' rel='bookmark' title='Random Tech Notes And Buzz Updates'>Random Tech Notes And Buzz Updates</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2010/databases/mongodb/getting-a-random-record-from-a-mongodb-collection/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>New Massachusetts Security Law Passed For Databases</title>
		<link>http://eric.lubow.org/2010/databases/new-massachusetts-security-law-passed-for-databases/</link>
		<comments>http://eric.lubow.org/2010/databases/new-massachusetts-security-law-passed-for-databases/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 10:00:28 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[legal]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=701</guid>
		<description><![CDATA[In case you haven&#8217;t heard about the new Massachusetts state law regarding consumer or client information in databases, you can read about it here, at Information Week, or just Google for &#8220;Massachusetts data security law&#8221;. And if you haven&#8217;t read about, then I strongly suggest you do. This is one of those instances where I [...]]]></description>
			<content:encoded><![CDATA[<p>In case you haven&#8217;t heard about the new Massachusetts state law regarding consumer or client information in databases, you can read about it <a href="http://www.sqlmag.com/print/sql-server/A-New-Law-that-Will-Change-the-Way-You-Build-Database-Applications.aspx">here</a>, at <a href="http://www.informationweek.com/news/security/government/showArticle.jhtml?articleID=224400426&#038;queryText=massachusetts%20cmr">Information Week</a>, or just <a href="http://google.com/">Google</a> for &#8220;Massachusetts data security law&#8221;.  And if you haven&#8217;t read about, then I strongly suggest you do.  This is one of those instances where I believe their heart is in the right place, even if the execution/implementation wasn&#8217;t perfect.<br />
<span id="more-701"></span><br />
I get the feeling that Mass will make an example of a few offenders and then hope that the law either gets picked up by other states or federalized.  The problem is that this law will only really affect companies that are headquartered in Mass.  I am by no means a lawyer, but I don&#8217;t believe that its legal for the state of Mass to go after companies that are headquartered in other states.</p>
<p>Now at the very least, this should be a reminder to developers and sysadmins to make sure that data is properly protected, both in storage and transfer, and that is properly managed within an application.  Although this is a scary prospect that the government is getting involved in software design and data management, it certainly isn&#8217;t the first time (think HIPAA, FIPS testing, and Sarbanes–Oxley).  Although those have been helpful, the goals seemed a little clearer.</p>
<p>This law is going to put a pretty heavy imposition on smaller organizations with regard to user education and basic requirements fulfillment for data storage.  It&#8217;s going to be quite a bit harder to bring a product to market in Mass.  Although it seems like the government is doing a service, here they may be doing a disservice to their state economy.</p>
<p>I guess we&#8217;ll have to see how this plays out.  Thoughts? </p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2009/ruby/rails/country-state-select-using-carmen-and-jquery/' rel='bookmark' title='Country-State Select Using Carmen and jQuery'>Country-State Select Using Carmen and jQuery</a></li>
<li><a href='http://eric.lubow.org/2010/book-reviews/mod-security-2-5-by-magnus-mischel/' rel='bookmark' title='Mod-Security 2.5 by Magnus Mischel'>Mod-Security 2.5 by Magnus Mischel</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2010/databases/new-massachusetts-security-law-passed-for-databases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speeding Up Your Selects and Sorts</title>
		<link>http://eric.lubow.org/2010/databases/mysql/speeding-up-your-selects-and-sorts/</link>
		<comments>http://eric.lubow.org/2010/databases/mysql/speeding-up-your-selects-and-sorts/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 13:30:07 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[mysql tricks]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=585</guid>
		<description><![CDATA[When you are using a framework, they typically set your VARCHAR size automatically to 255. This is normally fine since you are letting the framework abstract you away from most of the SQL. But if you interact with your SQL, there is a way to get a decent speed increase on your SELECTs and ORDER [...]]]></description>
			<content:encoded><![CDATA[<p>When you are using a framework, they typically set your VARCHAR size automatically to <strong>255</strong>.  This is normally fine since you are letting the framework abstract you away from most of the SQL.  But if you interact with your SQL, there is a way to get a decent speed increase on your SELECTs and ORDER BYs when you are working with VARCHARs.</p>
<p>The VARCHAR data type is only variable character size for storage, not for sorting and buffering.  In fact, since the MySQL optimizer doesn&#8217;t know how big the data in that column can be, it has to allocate the maximum size possible for that column.  So for sorting and buffering of the <em>name</em> and <em>email</em> columns below would take up <strong>310</strong> bytes per row.<br />
<span id="more-585"></span></p>
<p>To fix that you should alter the size of your columns.  Imagine having the optimizer have to go through an additional 310 bytes for every row.  If there are 500k rows in the table time 310 bytes for each row, that can add up in the amount of memory that the optimizer has to use to perform the sorting/buffering.</p>
<p>Consider the following table of businesses:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:450px;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">DESC</span> businesses;<br />
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----------------+--------------+------+-----+---------+----------------+</span><br />
<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">FIELD</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">TYPE</span> &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">KEY</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #66cc66;">|</span> Extra &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span><br />
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----------------+--------------+------+-----+---------+----------------+</span><br />
<span style="color: #66cc66;">|</span> id &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">INT</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">11</span><span style="color: #66cc66;">&#41;</span> &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> NO &nbsp; <span style="color: #66cc66;">|</span> PRI <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span> <span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">|</span> name &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> YES &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">|</span> url &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> YES &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">|</span> email &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> YES &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">|</span> description &nbsp; &nbsp;<span style="color: #66cc66;">|</span> text &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">|</span> YES &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">|</span> created_at &nbsp; &nbsp; <span style="color: #66cc66;">|</span> datetime &nbsp; &nbsp; <span style="color: #66cc66;">|</span> YES &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">|</span> updated_at &nbsp; &nbsp; <span style="color: #66cc66;">|</span> datetime &nbsp; &nbsp; <span style="color: #66cc66;">|</span> YES &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> &nbsp; &nbsp;<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----------------+--------------+------+-----+---------+----------------+</span><br />
<span style="color: #cc66cc;">7</span> <span style="color: #993333; font-weight: bold;">ROWS</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.00</span> sec<span style="color: #66cc66;">&#41;</span><br />
<br />
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">LENGTH</span><span style="color: #66cc66;">&#40;</span>name<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">LENGTH</span><span style="color: #66cc66;">&#40;</span>email<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">LENGTH</span><span style="color: #66cc66;">&#40;</span>url<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">FROM</span> businesses;<br />
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">-------------------+--------------------+------------------+</span><br />
<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">LENGTH</span><span style="color: #66cc66;">&#40;</span>name<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">LENGTH</span><span style="color: #66cc66;">&#40;</span>email<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">LENGTH</span><span style="color: #66cc66;">&#40;</span>url<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span><br />
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">-------------------+--------------------+------------------+</span><br />
<span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #cc66cc;">53</span> <span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #cc66cc;">36</span> <span style="color: #66cc66;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #cc66cc;">40</span> <span style="color: #66cc66;">|</span> <br />
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">-------------------+--------------------+------------------+</span><br />
<span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">ROW</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.40</span> sec<span style="color: #66cc66;">&#41;</span></div></div>
<p>You can change the name to column sizes to 75 (name), 50 (email), and 100 (url).  But realistically you don&#8217;t want to chance things getting cut off, so it may be better off to settle on each column here being a VARCHAR(100).  Even that would save drastically on the space required to perform a sort and buffer the results.</p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2008/databases/mysql/mysql-encoded-uri-search-and-replace/' rel='bookmark' title='MySQL Encoded URI Search and Replace'>MySQL Encoded URI Search and Replace</a></li>
<li><a href='http://eric.lubow.org/2010/perl/perl-modules/using-unique-keys-and-key-groups-with-background-jobs-in-gearmanclient/' rel='bookmark' title='Using Unique Keys and Key Groups with Background Jobs in Gearman::Client'>Using Unique Keys and Key Groups with Background Jobs in Gearman::Client</a></li>
<li><a href='http://eric.lubow.org/2009/system-administration/howto-recreate-devnull/' rel='bookmark' title='HOWTO Recreate /dev/null'>HOWTO Recreate /dev/null</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2010/databases/mysql/speeding-up-your-selects-and-sorts/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Database Read/Write Splitting in Frameworks/ORMs</title>
		<link>http://eric.lubow.org/2010/databases/mysql/database-readwrite-splitting-in-frameworksorms/</link>
		<comments>http://eric.lubow.org/2010/databases/mysql/database-readwrite-splitting-in-frameworksorms/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 13:00:27 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[frameworks]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=563</guid>
		<description><![CDATA[Although one of the primary ideas behind frameworks is to keep things as simple as possible, sometimes they create issues in the long run. What I am about to discuss is something of a luxury problem (as scaling usually is), but it is a problem nonetheless. When initially starting a project, whether you are using [...]]]></description>
			<content:encoded><![CDATA[<p>Although one of the primary ideas behind frameworks is to keep things as simple as possible, sometimes they create issues in the long run.  What I am about to discuss is something of a luxury problem (as scaling usually is), but it is a problem nonetheless.</p>
<p>When initially starting a project, whether you are using <a href="http://rubyonrails.org/">Ruby on Rails</a> (Ruby), <a href="http://www.djangoproject.com/">Django</a> (Python), <a href="http://cakephp.org/">CakePHP</a> (PHP), <a href="http://www.catalystframework.org/">Catalyst</a> (Perl), or any of the other 100s of frameworks in any of the languages out there, the first and most important thing to do is to get it out the door.  Once you have done that, it&#8217;s time to get users, fix bugs, and add features.  After you have done all that and you have a great web app, its time to think scaling. (Yes I realize that I have trivialized this process immensely, but its for a point, I promise).<br />
<span id="more-563"></span><br />
When starting to scale (whether its out or up) and you decide its time to add another database, its necessary to analyze your app and decide whether its read heavy or write heavy.  A lot of scaling comes in knowing your application and where its bottlenecks are.  Let&#8217;s assume that you are at the point that you need to add a database server.  What would be great is if you had a framework that allowed you to set some database servers as read-only in order to take load off the master.</p>
<p>In an abstract format, it would be a good idea to break out your SQL requirements into 2 functions: <strong>sql_write_query</strong> and <strong>sql_read_query</strong>.  Then have the functions go to your primary database server and slave database servers respectively.  The reason that you should do this instead of using a single function that sends the query to the &#8220;correct&#8221; location based on the SQL it finds is that your slave servers may be behind the master (which is the nature of replication) which could give you an incorrect result in your query.  This way, depending on the importance and type of the query, in your code you can choose the location that you want to send the query to.  The read queries where accuracy is extremely important can be sent to the database using <strong>sql_write_query</strong> and all others can be executed normally using your <strong>sql_read_query</strong> function.</p>
<p>How does this relate to frameworks and ORMs?  It would be very handy if frameworks provided a method to expand an application into splitting read and write queries that is native to the frameworks.  If it is native and isn&#8217;t hacked on afterward (like below), then you don&#8217;t have to muck around in the core code or write a plugin and you can stick to what you know best (which is your application).  That is not to say that one should prematurely optimize an application (which is a whole other issue that you need to be careful of) and build it out in a split read/write fashion from the beginning, but that there should be a native way for the application to be faster should you reach that point.</p>
<p>Building this out as a afterthought can be done like what&#8217;s below (the example is in Python but can be extrapolated to the language specific to the framework).</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:450px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> MySQLdb<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> write_mysql_query<span style="color: black;">&#40;</span>query<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; conn <span style="color: #66cc66;">=</span> MySQLdb.<span style="color: black;">connect</span><span style="color: black;">&#40;</span>host <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;dbvip1.example.com&quot;</span><span style="color: #66cc66;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #dc143c;">user</span> <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;root&quot;</span><span style="color: #66cc66;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; passwd <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;pass&quot;</span><span style="color: #66cc66;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; db <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;myapp&quot;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor <span style="color: #66cc66;">=</span> conn.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;SET AUTOCOMMIT=1&quot;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span>query<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; conn.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> read_mysql_query<span style="color: black;">&#40;</span>query<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; conn <span style="color: #66cc66;">=</span> MySQLdb.<span style="color: black;">connect</span><span style="color: black;">&#40;</span>host <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;slavedb1.example.com&quot;</span><span style="color: #66cc66;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #dc143c;">user</span> <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;root&quot;</span><span style="color: #66cc66;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; passwd <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;pass&quot;</span><span style="color: #66cc66;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; db <span style="color: #66cc66;">=</span> <span style="color: #483d8b;">&quot;myapp&quot;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor <span style="color: #66cc66;">=</span> conn.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span>query<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; result <span style="color: #66cc66;">=</span> cursor.<span style="color: black;">fetchall</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cursor.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; conn.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div></div>
<p>You can even do things for the writer in a more transactional fashion using <em>START TRANSACTION</em> and <em>COMMIT</em> if you don&#8217;t like using <em>AUTOCOMMIT</em>.  You&#8217;ll also notice that there is a connect every time a query is executed.  A lot of people will have an initial gut reaction of a problem here.  In fact, since most of your queries will be taking place over a LAN with a pretty fast backplane (or some other variation of a high speed network), it&#8217;s probably negligible.  Taking the load off of your master and dispersing it onto slaves will make the most difference here.</p>
<p>All this is a very oversimplified way of taking this step, but it is something that frameworks should consider.  Even if its just in a plugin fashion which can be taken advantage of if the database server is getting overloaded.</p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2010/python/when-to-use-mysql-cursor-classes-in-python/' rel='bookmark' title='When To Use MySQL Cursor Classes In Python'>When To Use MySQL Cursor Classes In Python</a></li>
<li><a href='http://eric.lubow.org/2009/python/python-multiprocessing-pools-and-mysql/' rel='bookmark' title='Python Multiprocessing Pools and MySQL'>Python Multiprocessing Pools and MySQL</a></li>
<li><a href='http://eric.lubow.org/2009/python/pythons-mysqldb-2014-error-commands-out-of-sync/' rel='bookmark' title='Python&#8217;s MySQLdb 2014 Error &#8211; Commands out of sync'>Python&#8217;s MySQLdb 2014 Error &#8211; Commands out of sync</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2010/databases/mysql/database-readwrite-splitting-in-frameworksorms/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MySQL Error 1033: Incorrect Information in File</title>
		<link>http://eric.lubow.org/2010/databases/mysql/mysql-error-1033-incorrect-information-in-file/</link>
		<comments>http://eric.lubow.org/2010/databases/mysql/mysql-error-1033-incorrect-information-in-file/#comments</comments>
		<pubDate>Tue, 05 Jan 2010 14:15:43 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[errors]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=496</guid>
		<description><![CDATA[If you&#8217;ve ever been plagued by an error 1033 issue in MySQL (replication will show it as well), then I might be able to help you out. The error reads something like, &#8220;Incorrect information in file: &#8216;./mydb/table.frm&#8217;. I classify this as another one of MySQLs cryptic error messages. Here is how I determined that this [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve ever been plagued by an error 1033 issue in MySQL (replication will show it as well), then I might be able to help you out.  The error reads something like, &#8220;Incorrect information in file: &#8216;./mydb/table.frm&#8217;.  I classify this as another one of MySQLs cryptic error messages.  Here is how I determined that this was my problem.</p>
<p>Googling around got me an answer, but I had to read a bunch of different responses to piece together the answer. Essentially this issue (in my case) was a result of the InnoDB engine not loading up when MySQL was restarted.  Therefore when MySQL tried to read the frm file (table description) which was written for an InnoDB table with the MyISAM reader, it didn&#8217;t like it.  Since MyISAM is the fallback engine, it went to that and the table became unusable.<br />
<span id="more-496"></span></p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Last_Errno: 1033<br />
Last_Error: Error 'Incorrect information in file: './st/table.frm'' on query. Default database: 'mydb'. Query: 'INSERT INTO `table` (`id`,`col1`) VALUES (1,'foobar')'<br />
# or<br />
mysql&gt; REPAIR TABLE table;<br />
+-------------+--------+----------+----------------------------------------------------+<br />
| Table &nbsp; &nbsp; &nbsp; | Op &nbsp; &nbsp; | Msg_type | Msg_text &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br />
+-------------+--------+----------+----------------------------------------------------+<br />
| mydb.table &nbsp;| repair | Error &nbsp; &nbsp;| Incorrect information in file: './mydb/table.frm' &nbsp;| <br />
| mydb.table &nbsp;| repair | error &nbsp; &nbsp;| Corrupt &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| <br />
+-------------+--------+----------+----------------------------------------------------+<br />
2 rows in set (0.02 sec)</div></div>
<p>I already knew my table <strong>table</strong> is an InnoDB table.  To be sure that this was the issue, I simply checked to see which engines were loaded (removed some for brevity).</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mysql&gt; SHOW ENGINES;<br />
+------------+----------+----------------------------------------------------------------+<br />
| Engine &nbsp; &nbsp; | Support &nbsp;| Comment &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|<br />
+------------+----------+----------------------------------------------------------------+<br />
| MyISAM &nbsp; &nbsp; | DEFAULT &nbsp;| Default engine as of MySQL 3.23 with great performance &nbsp; &nbsp; &nbsp; &nbsp; | <br />
| MEMORY &nbsp; &nbsp; | YES &nbsp; &nbsp; &nbsp;| Hash based, stored in memory, useful for temporary tables &nbsp; &nbsp; &nbsp;| <br />
| InnoDB &nbsp; &nbsp; | DISABLED | Supports transactions, row-level locking, and foreign keys &nbsp; &nbsp; | <br />
| CSV &nbsp; &nbsp; &nbsp; &nbsp;| YES &nbsp; &nbsp; &nbsp;| CSV storage engine &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | <br />
+------------+----------+----------------------------------------------------------------+</div></div>
<p>So here I notice that InnoDB is disabled.  (Note: I skipped the step where I check my <strong>my.cnf</strong> to make sure the <em>skip-innodb</em> line in the <em>[mysqld]</em> section was commented out.  I already knew it was, but if you are unsure, check.)  So I pop over to the error log and I see this:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">InnoDB: Unable to lock ./ibdata1, error: 11<br />
InnoDB: Check that you do not already have another mysqld process<br />
InnoDB: using the same InnoDB data or log files.<br />
091222 15:21:55 &nbsp;InnoDB: Unable to open the first data file<br />
InnoDB: Error in opening ./ibdata1<br />
091222 15:21:55 &nbsp;InnoDB: Operating system error number 11 in a file operation.<br />
InnoDB: Error number 11 means 'Resource temporarily unavailable'.<br />
InnoDB: Some operating system error numbers are described at<br />
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/operating-system-error-codes.html<br />
&nbsp;79InnoDB: Could not open or create data files.</div></div>
<p>This says to me that it is likely that the MySQL restart didn&#8217;t go as well as the initscript would have liked me to believe.  So I see what files are open and what&#8217;s running:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[root@db mysql]# lsof | grep ibdata1<br />
COMMAND &nbsp; &nbsp; PID &nbsp; &nbsp; &nbsp; &nbsp; USER &nbsp; FD &nbsp; &nbsp; &nbsp;TYPE &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DEVICE &nbsp; &nbsp; &nbsp; &nbsp;SIZE &nbsp; &nbsp; &nbsp; NODE NAME<br />
mysqld &nbsp; &nbsp;24574 &nbsp; &nbsp; &nbsp; &nbsp;mysql &nbsp; &nbsp;4uW &nbsp; &nbsp; REG &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8,3 &nbsp;5018484736 &nbsp; 61538308 /var/lib/mysql/ibdata1<br />
[root@db5 mysql]# ps ax | grep mysqld<br />
24536 pts/0 &nbsp; &nbsp;S &nbsp; &nbsp; &nbsp;0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql/ --pid-file=/var/lib/mysql//db.example.com.pid<br />
24574 pts/0 &nbsp; &nbsp;Sl &nbsp; &nbsp; 7:58 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql/ --user=mysql --pid-file=/var/lib/mysql//db.example.com.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock<br />
26635 pts/0 &nbsp; &nbsp;S &nbsp; &nbsp; &nbsp;0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql/ --pid-file=/var/lib/mysql//db.example.com.pid<br />
26666 pts/0 &nbsp; &nbsp;Sl &nbsp; &nbsp; 0:06 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql/ --user=mysql --pid-file=/var/lib/mysql//db.example.com.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock</div></div>
<p>Well look at that, 2 versions of MySQL running and <em>ibdata1</em> is being held open by one of them.  So now I do the ugly thing and kill the mysqld process that holding the file lock and then restart MySQL:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[root@db mysql]# kill -9 24574<br />
[root@db mysql]# ps ax | grep mysqld<br />
24536 pts/0 &nbsp; &nbsp;S &nbsp; &nbsp; &nbsp;0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql/ --pid-file=/var/lib/mysql//db.example.com.pid<br />
27051 pts/0 &nbsp; &nbsp;Rl &nbsp; &nbsp; 0:02 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql/ --user=mysql --pid-file=/var/lib/mysql//db.example.com.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock<br />
27075 pts/0 &nbsp; &nbsp;S+ &nbsp; &nbsp; 0:00 grep mysqld<br />
[root@db mysql]# /etc/init.d/mysql restart<br />
MySQL manager or server PID file could not be found! &nbsp; &nbsp; &nbsp; [FAILED]<br />
Starting MySQL............... &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;[ &nbsp;OK &nbsp;]</div></div>
<p>So back over to MySQL:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mysql&gt; SHOW ENGINES;<br />
+------------+----------+----------------------------------------------------------------+<br />
| Engine &nbsp; &nbsp; | Support &nbsp;| Comment &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|<br />
+------------+----------+----------------------------------------------------------------+<br />
| MyISAM &nbsp; &nbsp; | DEFAULT &nbsp;| Default engine as of MySQL 3.23 with great performance &nbsp; &nbsp; &nbsp; &nbsp; | <br />
| MEMORY &nbsp; &nbsp; | YES &nbsp; &nbsp; &nbsp;| Hash based, stored in memory, useful for temporary tables &nbsp; &nbsp; &nbsp;| <br />
| InnoDB &nbsp; &nbsp; | YES &nbsp; &nbsp; &nbsp;| Supports transactions, row-level locking, and foreign keys &nbsp; &nbsp; | <br />
| CSV &nbsp; &nbsp; &nbsp; &nbsp;| YES &nbsp; &nbsp; &nbsp;| CSV storage engine &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | <br />
+------------+----------+----------------------------------------------------------------+</div></div>
<p>There it is.  Now you should be able to start up replication again (if that was the issue).  Or if you didn&#8217;t discover this issue with replication, you should just be able to use your DB like normal.</p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2009/python/pythons-mysqldb-2014-error-commands-out-of-sync/' rel='bookmark' title='Python&#8217;s MySQLdb 2014 Error &#8211; Commands out of sync'>Python&#8217;s MySQLdb 2014 Error &#8211; Commands out of sync</a></li>
<li><a href='http://eric.lubow.org/2008/databases/mysql/mysql-encoded-uri-search-and-replace/' rel='bookmark' title='MySQL Encoded URI Search and Replace'>MySQL Encoded URI Search and Replace</a></li>
<li><a href='http://eric.lubow.org/2009/databases/mysql/counting-email-addresses-by-domain-in-mysql/' rel='bookmark' title='Counting Email Addresses By Domain in MySQL'>Counting Email Addresses By Domain in MySQL</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2010/databases/mysql/mysql-error-1033-incorrect-information-in-file/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>First Experience With Cassandra</title>
		<link>http://eric.lubow.org/2009/databases/first-experience-with-cassandra/</link>
		<comments>http://eric.lubow.org/2009/databases/first-experience-with-cassandra/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:00:16 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[cassandra]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=294</guid>
		<description><![CDATA[I recently posted about my initial experience with Tokyo Cabinet. Now it&#8217;s time to get to work on Cassandra. Cassandra is the production database that&#8217;s in use on Facebook for handling their email system and Digg. One thing that I would like to note is that when I tested TC, I used the Perl API [...]]]></description>
			<content:encoded><![CDATA[<p>I recently posted about my <a href="http://eric.lubow.org/2009/databases/tokyo-tyrant-and-tokyo-cabinet/">initial experience with</a> <a href="http://1978th.net/">Tokyo Cabinet</a>.  Now it&#8217;s time to get to work on <a href="http://incubator.apache.org/cassandra/">Cassandra</a>.</p>
<p><a href="http://incubator.apache.org/cassandra/">Cassandra</a> is the production database that&#8217;s in use on <a href="http://www.facebook.com/">Facebook</a> for handling their email system and <a href="http://digg.com/">Digg</a>.  </p>
<p>One thing that I would like to note is that when I tested TC, I used the Perl API for both TC and TT.  I tried both the Perl API and the Ruby API.  I couldn&#8217;t get the Ruby API (written by <a href="http://blog.evanweaver.com">Evan Weaver</a> of <a href="http://twitter.com/">Twitter</a>) it to work at all with the Cassandra server (although I am sure the server included with the gem works well).  I initially struggled quite a bit with the UUID aspects of the Perl API until I finally gave up and changed the <em>ColumnFamily CompareWith</em> type from</p>
<div class="codecolorer-container xml default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;columnfamily</span> <span style="color: #000066;">CompareWith</span>=<span style="color: #ff0000;">&quot;TimeUUIDType&quot;</span> <span style="color: #000066;">Name</span>=<span style="color: #ff0000;">&quot;Users&quot;</span> <span style="color: #000000; font-weight: bold;">/&gt;</span></span></div></div>
<p>to</p>
<div class="codecolorer-container xml default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;columnfamily</span> <span style="color: #000066;">CompareWith</span>=<span style="color: #ff0000;">&quot;BytesType&quot;</span> <span style="color: #000066;">Name</span>=<span style="color: #ff0000;">&quot;Users&quot;</span> <span style="color: #000000; font-weight: bold;">/&gt;</span></span></div></div>
<p>Then everything was working well and I began to write my tests.  The layout that I ended up using is going to be one that works in a schemaless fashion.  I created 2 consistent columns per user: <strong>email</strong> and <strong>person_id</strong>. Here is where it gets interesting and different for those of us used to RDBMS&#8217;s and having less columns.  For this particular project, every time a user is sent an email, there is a &#8220;row&#8221; (I call it a row for those unfamiliar with Cassandra terminology, it is actually a column) added in the format of: send_dates_&lt;date&gt; (note the structure below).  The value of this column is the mailing campaign id sent to the user on this date.  This means that if the user receives 365 emails per year at one a day, then there will be 365 rows (or Cassandra columns) that start with <strong>send_dates_</strong> and end with <strong>YYYY-MM-DD</strong>.  Note the data structure below in a JSON ish format.</p>
<div class="codecolorer-container javascript default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Users <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span> <br />
&nbsp; &nbsp; <span style="color: #3366CC;">'foo@example.com'</span><span style="color: #339933;">:</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; email<span style="color: #339933;">:</span> <span style="color: #3366CC;">'foo@example.com'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; person_id<span style="color: #339933;">:</span> <span style="color: #3366CC;">'123456'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; send_dates_2009<span style="color: #339933;">-</span>09<span style="color: #339933;">-</span><span style="color: #CC0000;">30</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'2245'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; send_dates_2009<span style="color: #339933;">-</span><span style="color: #CC0000;">10</span><span style="color: #339933;">-</span>01<span style="color: #339933;">:</span> <span style="color: #3366CC;">'2247'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; <span style="color: #3366CC;">'bar@baz.com'</span><span style="color: #339933;">:</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; email<span style="color: #339933;">:</span> <span style="color: #3366CC;">'bar@baz.com'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; person_id<span style="color: #339933;">:</span> <span style="color: #3366CC;">'789'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; send_dates_2009<span style="color: #339933;">-</span>09<span style="color: #339933;">-</span><span style="color: #CC0000;">30</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'2245'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; send_dates_2009<span style="color: #339933;">-</span><span style="color: #CC0000;">10</span><span style="color: #339933;">-</span>01<span style="color: #339933;">:</span> <span style="color: #3366CC;">'2246'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span><br />
<span style="color: #009900;">&#125;</span></div></div>
<p>To understand all the data structures in Cassandra better, I strongly recommend reading <a href="http://arin.me/code/wtf-is-a-supercolumn-cassandra-data-model">WTF Is A SuperColumn Cassandra Data Model</a> and <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/">Up And Running With Cassandra</a>.  They are written by folks at <a href="http://digg.com">Digg</a> and <a href="http://twitter.com/">Twitter</a> respectively and are well worth the reads.</p>
<p>So for my first iteration, I simply loaded up the data in the format mentioned above.  Every insert does an insert of an email and person_id just in case they aren&#8217;t there to begin with.  The initial data set has approximately 3.6 million records.  This caused all sorts of problems with the default configurations (ie kept crashing on me).  The changes I made to the default configuration are as follows:</p>
<ul>
<li>Change the maximum file descriptors from 1024 (system default) to 65535 (or unlimited)</li>
<li>Change the default minimum and maximum Java -Xms256M -Xmx2G (could not get the data to load past 2.5 million records without upping max memory values)</li>
</ul>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #000000; font-weight: bold;">time</span> .<span style="color: #000000; font-weight: bold;">/</span>cas_load.pl <span style="color: #660033;">-D</span> <span style="color: #000000;">2009</span>-09-<span style="color: #000000;">30</span> <span style="color: #660033;">-c</span> queue-mail.ini <span style="color: #660033;">-b</span> lists<span style="color: #000000; font-weight: bold;">/</span><br />
usa: <span style="color: #000000;">99</span>,<span style="color: #000000;">272</span><br />
top: <span style="color: #000000;">3</span>,<span style="color: #000000;">661</span>,<span style="color: #000000;">491</span><br />
Total: <span style="color: #000000;">3</span>,<span style="color: #000000;">760</span>,<span style="color: #000000;">763</span><br />
<br />
real &nbsp; &nbsp;72m50.826s<br />
user &nbsp; &nbsp;29m57.954s<br />
sys &nbsp; &nbsp; 2m18.816s<br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 cassandra<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;"># du -sh data/Mailings/ # Prior to data compaction</span><br />
13G &nbsp; &nbsp; data<span style="color: #000000; font-weight: bold;">/</span>Mailings<span style="color: #000000; font-weight: bold;">/</span><br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 cassandra<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;"># du -sh data/Mailings/ # Post data compaction</span><br />
1.4G &nbsp; &nbsp;data<span style="color: #000000; font-weight: bold;">/</span>Mailings<span style="color: #000000; font-weight: bold;">/</span></div></div>
<p>It was interesting to note that the write latency of about 3.6 million records was 0.004 ms.  Also the data compaction brought the size of the records on disk down from 13G to 1.4G.  Those figures are being achieved with the reads and writes happening on the same machines.</p>
<p>The load of the second data set took a mere 30m when compared to loading that same data set into Tokyo Cabinet which took closer to 180m.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">luxe: <span style="color: #000000;">936</span>,<span style="color: #000000;">911</span><br />
amex: <span style="color: #000000;">599</span>,<span style="color: #000000;">981</span><br />
mex: <span style="color: #000000;">39</span>,<span style="color: #000000;">700</span><br />
Total: <span style="color: #000000;">1</span>,<span style="color: #000000;">576</span>,<span style="color: #000000;">592</span><br />
<br />
real &nbsp; &nbsp;30m53.109s<br />
user &nbsp; &nbsp;12m53.507s<br />
sys &nbsp; &nbsp; 0m59.363s<br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 cassandra<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;"># du -sh data/Mailings/</span><br />
2.4G &nbsp; &nbsp;data<span style="color: #000000; font-weight: bold;">/</span>Mailings<span style="color: #000000; font-weight: bold;">/</span></div></div>
<p>Now that there is a dataset worth working with, it&#8217;s time to start the read tests.</p>
<p>For the first test, I am doing a simple <em>get</em> of the <strong>email</strong> column.  This is just to iterate over the column and find out the approximate speed of the read operations.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Run <span style="color: #000000;">1</span>: 134m59.923s<br />
Run <span style="color: #000000;">2</span>; 125m55.673s<br />
Run <span style="color: #000000;">3</span>: 127m21.342s<br />
Run <span style="color: #000000;">4</span>: 119m2.414s</div></div>
<p>For the second test, I made use of a Cassandra feature called <em>get_slice</em>.  Since I have columns that are in the format <strong>send_dates_YYYY-MM-DD</strong>, I used <em>get_slice</em> to grab all column names on a per-row (each email address) basis that were between <strong>send_dates_2009-09-29</strong> and <strong>send_dates_2009-10-29</strong>.  The maximum amount that could be matched were 2 (since I only loaded 2 days worth of data into the data base).  I used data from a 3rd day so I can get 0, 1, or 2 as results.</p>
<p>This first run is using the Perl version of the script.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Email Count: <span style="color: #000000;">3557584</span><br />
Match <span style="color: #000000;">0</span>: <span style="color: #000000;">4</span>,<span style="color: #000000;">247</span><br />
Match <span style="color: #000000;">1</span>: <span style="color: #000000;">1</span>,<span style="color: #000000;">993</span>,<span style="color: #000000;">273</span><br />
Match <span style="color: #000000;">2</span>: <span style="color: #000000;">1</span>,<span style="color: #000000;">560</span>,064<br />
<br />
real &nbsp; &nbsp;177m23.000s<br />
user &nbsp; &nbsp;45m21.859s<br />
sys &nbsp; &nbsp; 9m17.516s<br />
<br />
Run <span style="color: #000000;">2</span>: 151m27.042s</div></div>
<p>Subsequent runs I began to run into API issues and rewrote the same script in Python to see if the more well tested <a href="http://wiki.apache.org/cassandra/API">Thrift</a> Python API was faster than the <a href="http://wiki.apache.org/cassandra/API">Thrift</a> Perl API (and wouldn&#8217;t give me timeout issues).  The Perl timeout issues ended up being fixable, but I continued with the tests in Python.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #000000; font-weight: bold;">time</span> python cas_get_slice.py<br />
<span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">0</span>: <span style="color: #000000;">4170</span>, <span style="color: #000000;">1</span>: <span style="color: #000000;">1935783</span>, <span style="color: #000000;">2</span>: <span style="color: #000000;">1560042</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><br />
Total: <span style="color: #000000;">3557584</span><br />
<br />
real &nbsp; &nbsp;213m57.854s<br />
user &nbsp; &nbsp;14m57.768s<br />
sys &nbsp; &nbsp; 0m51.634s<br />
<br />
Run <span style="color: #000000;">2</span>: 132m27.930s<br />
Run <span style="color: #000000;">3</span>: 156m19.906s<br />
Run <span style="color: #000000;">4</span>: 127m34.715s</div></div>
<p>Ultimately with Cassandra, there was quite a bit of a learning curve.  But in my opinion is well worth it.  Cassandra is an extremely powerful database system that I plan on continuing to explore in greater detail with a few more in depth tests.  If you have the chance, take a look at Cassandra.</p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2009/databases/tokyo-tyrant-and-tokyo-cabinet/' rel='bookmark' title='Tokyo Tyrant and Tokyo Cabinet'>Tokyo Tyrant and Tokyo Cabinet</a></li>
<li><a href='http://eric.lubow.org/2010/tips/counting-frequencies-of-frequencies/' rel='bookmark' title='Counting Frequencies of Frequencies'>Counting Frequencies of Frequencies</a></li>
<li><a href='http://eric.lubow.org/2010/databases/mysql/speeding-up-your-selects-and-sorts/' rel='bookmark' title='Speeding Up Your Selects and Sorts'>Speeding Up Your Selects and Sorts</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2009/databases/first-experience-with-cassandra/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Tokyo Tyrant and Tokyo Cabinet</title>
		<link>http://eric.lubow.org/2009/databases/tokyo-tyrant-and-tokyo-cabinet/</link>
		<comments>http://eric.lubow.org/2009/databases/tokyo-tyrant-and-tokyo-cabinet/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 14:00:38 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Tokyo Cabinet]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=278</guid>
		<description><![CDATA[Tokyo Tyrant and Tokyo Cabinet are the components for a database used by Mixi (basically a Japanese Facebook). And for work, I got to play with these tools for some research. Installing all this stuff along with the Perl APIs is incredibly easy. Ultimately I am working on a comparison of Cassandra and Tokyo Cabinet, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://1978th.net/tokyotyrant/">Tokyo Tyrant</a> and <a href="http://1978th.net/tokyocabinet/">Tokyo Cabinet</a> are the components for a database used by Mixi (basically a Japanese Facebook).  And for work, I got to play with these tools for some research.  Installing all this stuff along with the Perl APIs is incredibly easy.</p>
<p>Ultimately I am working on a comparison of <a href="http://incubator.apache.org/cassandra/">Cassandra</a> and <a href="http://1978th.net/">Tokyo Cabinet</a>, but I will get to more on <a href="http://incubator.apache.org/cassandra/">Cassandra</a> later.</p>
<p>Ideally the tests I am going to be doing are fairly simple. I am going to be loading a few million rows into a TCT database (which is a table database in TC terms) and then loading key, value pairs into the database.  The layout in a hash format is basically going to be as follows:</p>
<div class="codecolorer-container perl default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #ff0000;">&quot;user@example.com&quot;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #009900;">&#123;</span> &nbsp; <span style="color: #ff0000;">&quot;sendDates&quot;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #009900;">&#123;</span><span style="color: #ff0000;">&quot;2009-09-30&quot;</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span> &nbsp; <span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #ff0000;">&quot;123456789&quot;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #009900;">&#123;</span> &nbsp;<span style="color: #ff0000;">&quot;2009-09-30&quot;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #009900;">&#123;</span><span style="color: #ff0000;">&quot;2287&quot;</span><span style="color: #009900;">&#125;</span> &nbsp; <span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span><br />
<span style="color: #009900;">&#125;</span></div></div>
<p>I ran these tests in the following formats for INSERTing the data into the a table database and as serialized data in a hash database.  It is necessary to point out that the load on this machine is the normal load.  Therefore it cannot be a true benchmark.  Since the conditions are not optimal (but really, when are they ever), take the results with a grain of salt.  Also, there is some data munging going on during every iteration to grab the email addresses and other data.  All this is being done through the Perl API and Tokyo Tyrant.  The machine that this is running on is a Dual Dual Core 2.5GHz Intel Xeon processor with 16G of memory.</p>
<p>For the first round, a few things should be noted:</p>
<ul>
<li>The totals referenced below are email address counts add/modified in the db</li>
<li>I am only using 1 connection to the Tokyo Tyrant DB and it is currently setup to handle 8 threads</li>
<li>I didn&#8217;t do any memory adjustment on startup, so the default (which is marginal) is in use</li>
<li>I am only using the standard put operations, not <em>putcat</em>, <em>putkeep</em>, or <em>putnr</em> (which I will be using later)</li>
</ul>
<p>The results of the table database are as follows.  It is also worth noting the size of the table is around 410M on disk.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #000000; font-weight: bold;">time</span> .<span style="color: #000000; font-weight: bold;">/</span>tct_test.pl <span style="color: #660033;">-b</span> lists<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #660033;">-D</span> <span style="color: #000000;">2009</span>-09-<span style="color: #000000;">30</span> <span style="color: #660033;">-c</span> queue-mail.ini <br />
usa: <span style="color: #000000;">99</span>,<span style="color: #000000;">272</span><br />
top: <span style="color: #000000;">3</span>,<span style="color: #000000;">661</span>,<span style="color: #000000;">491</span><br />
Total: <span style="color: #000000;">3</span>,<span style="color: #000000;">760</span>,<span style="color: #000000;">763</span><br />
<br />
real &nbsp; &nbsp;291m53.204s<br />
user &nbsp; &nbsp;4m53.557s<br />
sys &nbsp; &nbsp; 2m35.604s<br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>root<span style="color: #000000; font-weight: bold;">@</span>db5 tmp<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;"># ls -l</span><br />
<span style="color: #660033;">-rw-r--r--</span> <span style="color: #000000;">1</span> root root <span style="color: #000000;">410798800</span> Oct &nbsp;<span style="color: #000000;">6</span> <span style="color: #000000;">23</span>:<span style="color: #000000;">15</span> mailings.tct</div></div>
<p>The structure for the hash database (seeing as its only key value) is as follows:</p>
<div class="codecolorer-container perl default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; &nbsp; <span style="color: #ff0000;">&quot;user@example.com&quot;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #ff0000;">&quot;2009-09-30&quot;</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #ff0000;">&quot;123456789&quot;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #ff0000;">&quot;2009-09-30|2287&quot;</span><span style="color: #339933;">,</span></div></div>
<p>The results of loading the same data into a hash database are as follows. It is also worth noting the size of the table is around 360M on disk.  This is significantly smaller than the 410M of the table database containing the same style data.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #000000; font-weight: bold;">time</span> .<span style="color: #000000; font-weight: bold;">/</span>tch_test.pl <span style="color: #660033;">-b</span> lists<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #660033;">-D</span> <span style="color: #000000;">2009</span>-09-<span style="color: #000000;">30</span> <span style="color: #660033;">-c</span> queue-mail.ini <br />
usa: <span style="color: #000000;">99</span>,<span style="color: #000000;">272</span><br />
top: <span style="color: #000000;">3</span>,<span style="color: #000000;">661</span>,<span style="color: #000000;">491</span><br />
Total: <span style="color: #000000;">3</span>,<span style="color: #000000;">760</span>,<span style="color: #000000;">763</span><br />
<br />
real &nbsp; &nbsp;345m29.444s<br />
user &nbsp; &nbsp;2m23.338s<br />
sys &nbsp; &nbsp; 2m15.768s<br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>root<span style="color: #000000; font-weight: bold;">@</span>db5 tmp<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #666666; font-style: italic;"># ls -l</span><br />
<span style="color: #660033;">-rw-r--r--</span> <span style="color: #000000;">1</span> root root <span style="color: #000000;">359468816</span> Oct &nbsp;<span style="color: #000000;">7</span> <span style="color: #000000;">17</span>:<span style="color: #000000;">50</span> mailings.tch</div></div>
<p></p>
<p>For the second round, I loaded a second days worth of data in to the database.  I used the same layouts as above with the following noteworthy items:</p>
<ul>
<li>I did a <em>get</em> first prior to the <em>put</em> to decide whether to use <em>put</em> or <em>putcat</em></li>
<li>The new data structure is now either &#8220;2009-09-30,2009-10-01&#8243; or &#8220;2009-09-30|1995,2009-10-01|1996&#8243;</li>
</ul>
<p>Results of the hash database test round 2:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #000000; font-weight: bold;">time</span> .<span style="color: #000000; font-weight: bold;">/</span>tch_test.pl <span style="color: #660033;">-b</span> lists<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #660033;">-D</span> <span style="color: #000000;">2009</span>-<span style="color: #000000;">10</span>-01 <span style="color: #660033;">-c</span> queue-mail.ini <br />
luxe: <span style="color: #000000;">936</span>,<span style="color: #000000;">911</span><br />
amex: <span style="color: #000000;">599</span>,<span style="color: #000000;">981</span><br />
mex: <span style="color: #000000;">39</span>,<span style="color: #000000;">700</span><br />
Total: <span style="color: #000000;">1</span>,<span style="color: #000000;">576</span>,<span style="color: #000000;">592</span><br />
<br />
real &nbsp; &nbsp;177m55.280s<br />
user &nbsp; &nbsp;1m53.289s<br />
sys &nbsp; &nbsp; 2m8.606s<br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #c20cb9; font-weight: bold;">ls</span> <span style="color: #660033;">-l</span><br />
<span style="color: #660033;">-rw-r--r--</span> <span style="color: #000000;">1</span> root root <span style="color: #000000;">461176784</span> Oct &nbsp;<span style="color: #000000;">7</span> <span style="color: #000000;">23</span>:<span style="color: #000000;">44</span> mailings.tch</div></div>
<p>Results of the table database test round 2:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #000000; font-weight: bold;">time</span> .<span style="color: #000000; font-weight: bold;">/</span>tct_test.pl <span style="color: #660033;">-b</span> lists<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #660033;">-D</span> <span style="color: #000000;">2009</span>-<span style="color: #000000;">10</span>-01 <span style="color: #660033;">-c</span> queue-mail.ini<br />
luxe: <span style="color: #000000;">936</span>,<span style="color: #000000;">911</span><br />
amex: <span style="color: #000000;">599</span>,<span style="color: #000000;">981</span><br />
mex: <span style="color: #000000;">39</span>,<span style="color: #000000;">700</span><br />
Total: <span style="color: #000000;">1</span>,<span style="color: #000000;">576</span>,<span style="color: #000000;">592</span><br />
<br />
real &nbsp; &nbsp;412m19.007s<br />
user &nbsp; &nbsp;4m39.064s<br />
sys &nbsp; &nbsp; 2m22.343s<br />
<span style="color: #7a0874; font-weight: bold;">&#91;</span>elubow<span style="color: #000000; font-weight: bold;">@</span>db5 db<span style="color: #7a0874; font-weight: bold;">&#93;</span>$ <span style="color: #c20cb9; font-weight: bold;">ls</span> <span style="color: #660033;">-l</span><br />
<span style="color: #660033;">-rw-r--r--</span> <span style="color: #000000;">1</span> root root <span style="color: #000000;">512258816</span> Oct &nbsp;<span style="color: #000000;">8</span> <span style="color: #000000;">12</span>:<span style="color: #000000;">41</span> mailings.tct</div></div>
<p>When it comes down to the final implementation, I will likely be parallelizing the <em>put</em> in some form.  I would like to think that a database designed for this sort of thing works best in a concurrent environment (especially considering the default startup value is 8 threads).</p>
<p>It is obvious that when it comes to load times, that the hash database is much faster.  Now its time to run some queries and see how this stuff goes down.</p>
<p>So I ran some queries first against the table database.  I grabbed a new list of 3.6 million email addresses and iterated over the list, grabbed the record from the table database and counted how many dates (via array value counts) were entered for that email address.  I ran the script 4 times and results were as follows.  I typically throw out the first run since caching kicks in for the other runs.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Run <span style="color: #000000;">1</span>: 10m35.689s<br />
Run <span style="color: #000000;">2</span>: 5m41.896s<br />
Run <span style="color: #000000;">3</span>: 5m44.505s<br />
Run <span style="color: #000000;">4</span>: 5m44.329s</div></div>
<p>Doing the same thing for the hash database, I got the following result set:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Run <span style="color: #000000;">1</span>: 7m54.292s<br />
Run <span style="color: #000000;">2</span>: 4m13.467s<br />
Run <span style="color: #000000;">3</span>: 3m59.302s<br />
Run <span style="color: #000000;">4</span>: 4m13.277s</div></div>
<p>I think the results speak for themselves.  A hash database is obviously faster (which is something most of us assumed from the beginning).  The rest of time comes form programmatic comparisons like date comparisons in specific slices of the array.  Load times can be sped up using concurrency, but given the requirements of the project, the <em>get</em>s have to be done in this sequential fashion.</p>
<p>Now its on to testing <a href="http://incubator.apache.org/cassandra/">Cassandra</a> in a similar fashion for comparison.</p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2007/perl/creating-a-process-table-hash-in-perl/' rel='bookmark' title='Creating a Process Table hash in Perl'>Creating a Process Table hash in Perl</a></li>
<li><a href='http://eric.lubow.org/2009/databases/first-experience-with-cassandra/' rel='bookmark' title='First Experience With Cassandra'>First Experience With Cassandra</a></li>
<li><a href='http://eric.lubow.org/2010/databases/mysql/mysql-error-1033-incorrect-information-in-file/' rel='bookmark' title='MySQL Error 1033: Incorrect Information in File'>MySQL Error 1033: Incorrect Information in File</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2009/databases/tokyo-tyrant-and-tokyo-cabinet/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Counting Email Addresses By Domain in MySQL</title>
		<link>http://eric.lubow.org/2009/databases/mysql/counting-email-addresses-by-domain-in-mysql/</link>
		<comments>http://eric.lubow.org/2009/databases/mysql/counting-email-addresses-by-domain-in-mysql/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 13:50:21 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=92</guid>
		<description><![CDATA[Every so often I find some statistical need that although Perl program would be easy to write for it, its probably something the database should just handle. So I have a column in an email management table that has just the email addresses in the format user@domain.tld. I want to know which domains make up [...]]]></description>
			<content:encoded><![CDATA[<p>Every so often I find some statistical need that although Perl program would be easy to write for it, its probably something the database should just handle.  So I have a column in an email management table that has just the email addresses in the format <strong>user@domain.tld</strong>.</p>
<p>I want to know which domains make up the majority of the users.  (The numbers have been changed to protect the innocent):</p>
<pre>
mysql> SELECT SUBSTRING_INDEX(email, '@', -1) as Domain, count(*) as Total
      FROM email_list
GROUP BY Domain
ORDER BY Total DESC
       LIMIT 15;
+----------------+---------+
| Domain         | Total   |
+----------------+---------+
| yahoo.com      | 1304000 |
| hotmail.com    |  908400 |
| aol.com        |  800000 |
| msn.com        |  168000 |
| gmail.com      |  161000 |
| comcast.net    |  143000 |
| sbcglobal.net  |  110000 |
| bellsouth.net  |   62000 |
| cox.net        |   58000 |
| verizon.net    |   56000 |
| earthlink.net  |   52000 |
| charter.net    |   46000 |
| juno.com       |   30000 |
| optonline.net  |   22000 |
| netzero.com    |   17000 |
+----------------+---------+
</pre>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2010/mail/list-of-feedback-loops/' rel='bookmark' title='List of Feedback Loops'>List of Feedback Loops</a></li>
<li><a href='http://eric.lubow.org/2009/mail/transferring-email-from-gmailgoogle-apps-to-dovecot-with-larch/' rel='bookmark' title='Transferring Email From Gmail/Google Apps to Dovecot With Larch'>Transferring Email From Gmail/Google Apps to Dovecot With Larch</a></li>
<li><a href='http://eric.lubow.org/2010/seo/seo-and-cross-domain-content-syndication/' rel='bookmark' title='SEO and Cross-Domain Content Syndication'>SEO and Cross-Domain Content Syndication</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2009/databases/mysql/counting-email-addresses-by-domain-in-mysql/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Character Encoding</title>
		<link>http://eric.lubow.org/2008/databases/mysql/character-encoding/</link>
		<comments>http://eric.lubow.org/2008/databases/mysql/character-encoding/#comments</comments>
		<pubDate>Thu, 23 Oct 2008 21:44:58 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[character encoding]]></category>

		<guid isPermaLink="false">http://eric.lubow.org/?p=77</guid>
		<description><![CDATA[I recently ran into some character encoding issues and I wanted to share the fun. The default character encoding for MySQL on Gentoo is latin-1 or iso-8859-1. This wasn&#8217;t a problem until we recently started putting content straight from the DB through Java and onto the web. Java connects to the DB with a character [...]]]></description>
			<content:encoded><![CDATA[<p>I recently ran into some character encoding issues and I wanted to share the fun.  The default character encoding for MySQL on Gentoo is latin-1 or iso-8859-1.  This wasn&#8217;t a problem until we recently started putting content straight from the DB through Java and onto the web.  Java connects to the DB with a character encoding (typically UTF-8).  Since UTF-8 is roughly a superset of iso-8859-1, it generally wasn&#8217;t a problem.  Except when UTF-8 and UTF-16 characters were put into an iso-8859-1 database without translation.</p>
<p>What was essentially happening was that the data was being stored as iso-8859-1.  The Java code was connecting to the DB in UTF-8 and pulling it into Java (which is usually UTF-16, but in this case was being handled as UTF-8).  It was then being sent to the browser as URL encoded UTF-8 when in reality, it hadn&#8217;t even properly been put into UTF-8 character encoding.  This then gave the web browser some funny yen symbols and question marks.  This was not quite what we were aiming for.</p>
<p>The moral of this story is that it is necessary to realize the character encoding of the start point and end point of your data.  It is crucial that the code points match up otherwise they could potentially make for an interesting screen given to the reader.  All this could have been avoided with a simple: <strong>ALTER TABLE myTable MODIFY myColumn VARCHAR(255) CHARACTER SET utf8;</strong>.</p>


<p>Related posts:<ol><li><a href='http://eric.lubow.org/2008/databases/mysql/mysql-encoded-uri-search-and-replace/' rel='bookmark' title='MySQL Encoded URI Search and Replace'>MySQL Encoded URI Search and Replace</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://eric.lubow.org/2008/databases/mysql/character-encoding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

