MySQL Encoded URI Search and Replace

So I can definitely not take credit for this trick. The credit goes to Augusto Bott of the Pythian Group for this. I have a table in my DB that has columns of encoded URIs. The list of URI encoding character translations is available here.

To accomplish this, the MySQL REPLACE function was used.

mysql> SET @test:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @test as original,
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),  -- REPLACE() is case sensitive
    ->        '%7c','|'),  -- so we have
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),  -- to repeat
    ->        '%2D','-'),
    ->        '%2d','-'),  -- ourselves
    ->        '%2B','+'),
    ->        '%2b','+'),  -- sometimes
    ->        '%25','%') as replaced;
+--------------------------+----------+
| original                 | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 |  '|&^+-% |
+--------------------------+----------+
1 row in set (0.01 sec)

To make our lives easier, let’s create a function from that.

mysql> CREATE FUNCTION url_replace
    -> (url VARCHAR(1024))
    -> RETURNS VARCHAR(1025)
    -> BEGIN
    -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),
    ->        '%7c','|'),
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),
    ->        '%2D','-'),
    ->        '%2d','-'),
    ->        '%2B','+'),
    ->        '%2b','+'),
    ->        '%25','%');
    -> END$$
Query OK, 0 rows affected (0.02 sec)

And to wrap it up, let’s use it in an actual query. This query finds all instances of a % and replaces it with its unencoded character equivalent.

mysql> UPDATE website_page SET uri = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( uri,'%20',' '), '%27','\''), '%7C','|'), '%7c','|'), '%26','&'), '%5E','^'), '%5e','^'), '%2D','-'), '%2d','-'), '%2B','+'), '%2b','+'), '%25','%'), '%2C',',') WHERE uri LIKE '%\%%';
Posted in MySQL. Tags: . No Comments »

Apache mod_proxy

I came up against the interesting problem of putting multiple stand alone apache tomcat instances with different virtual host names on the same machine that all needed to be accessible via port 80 (on the same IP). There is always mod_jk, but that seems like a bit too much to fix a simple problem. Being a strong believer in the right tool for the right job, I came across mod_proxy. This way I get to take advantage of apache connection handling without having to put a whole proxy server in front of it. Because there is dispatching by virtual host to do, putting apache in front just seemed to be the best idea.

Since there aren’t too many clear HOWTOs on this, it took a bit of fudging. Here is what you need to know.

Let’s create the host http://port8080.lubow.org/ to go to http://8080.lubow.org:8080/.

The first thing is a fairly common default configuration of NameVirtualHost option. This is so you can have multiple virtual hosts per IP. Unless you are crazy (or have a really good reason), you do not want to create an open proxy. So you need to globally configure the ProxyRequests variable to be off. Do the base setup for a VirtualHost of ServerName and ServerAdmin.

Setup the proxy authorizations (similar to the apache allow/denys). In order for the right HTTP headers to make it to the proxy’d virtual host, the headers will need to be rewritten. This needs to happen both going to the host and coming back from the host going to the client. This is why there is the ProxyPass and ProxyPassReverse. The first argument is the URL that on the virtual host that should match the URL (second argument) on the proxy’d virtual host. The ProxyPreserveHost option is generally not needed (but it is for the specific application I am running. Click the link above to read the description to determine whether it is right for you.

Putting it all together, you will get a file that looks like below. Make sure you replace your IPs and hostnames with what’s appropriate for your environment.

ProxyRequests Off
NameVirtualHost 1.2.3.4:80

<virtualhost 1.2.3.4:80>
     ServerAdmin webmaster@lubow.org
         ServerName port8080.lubow.org
         <proxy *>
                  Order deny,allow
                  Allow from all
         </proxy>
         ProxyPreserveHost   On
         ProxyPass   / http://8080.lubow.org:8080/
         ProxyPassReverse    / http://8080.lubow.org:8080/
</virtualhost>
Posted in Apache. Tags: , . 5 Comments »

The Next Step In Browser Evolution

I was having a chat with my two friends from Redub Consulting about the new Google Chrome browser. At a cursory exploration, we found that (as promised) the Javascript engine is incredibly fast. But I don’t want to throw that out there since Google already us that in their Chrome Comic. I want to talk about where this could be leading.

As some of you know, Adobe Air is a desktop application that can interact with internet applications. The catch here is that since its a desktop application, it has access to the same elements of the physical machine as any other desktop application (USB ports, printers, sound/video out ports, etc). Browsers don’t yet that kind of access to a computer. They are limited to the user space in which they are run in. All the sound and video you hear and see is sent through 3rd party applications within the browser. What if the browser could control those elements of your machine? What if your entire computer experience was now internet based. Google is already trying to push this with software as a service (GoogleDocs), but keep extending this idea. What if your media center could be controlled via an internet application?

Eclipse IDE is now at a point at which you can your code as its running and change function calls at the opcode level to avoid recompiling your program over and over. Eclipse has grown to the point where its almost like an OS in its capabilities. In that same vein, Google’s new browser now controls its individual tabs and sandboxes each tab in order to have task level control over potentially runaway web applications.

So what I am trying to say here? I’m glad you asked. I believe this browser is the next step towards ubiquitous computing in the sense that 1 application to control your internet (or whole user) experience. AppleTV for instance is a set of specially designed hardware that can be interacted with over the internet. By allowing applications, such as Air (and potentially soon Chrome) to internet directly with the hardware attached to the computer, you are are negating the need for that specially designed hardware. One piece of hardware can be designed to do it all in terms of the interactive experience. Google is stepping to the plate and pushing forward for just this type of innovation. Keep an eye on the features of Google Chrome to come. If it becomes integrated any deeper into the desktop, it will open up a new age of ubiquitous computing.

Posted in Misc. Tags: , , . No Comments »