Check out the Latest Articles:
We’re Moving Back East!

After much thought, Gina and I have decided to move back to the east coast with Carly and Remmy. The girls will be able to see their Grandparents, Aunts, Uncles, and Cousins much more often this way, and it makes a lot of sense to come back.

I have accepted a new job at Basho (basho.com) as a Consulting Engineer, and we’re planning on moving in mid-june.

Wish us luck on the next leg of our journey!

Data n00b: Simple URL Tinier with Riak and PHP

I’ve been wanting to make a few tutorials that have to do with big data, NoSQL, and distributed systems for some time now. As you can see I haven’t had too much time to write blog entries ever since we found out we were having twins! This is a technical post so I’ll keep to that subject, but just so you know our twin girls Carly and Remmy are doing great and getting bigger every day.

I’ll be calling these few tutorials the Data n00b series because they are targeted at people new to this sect of programming / technology. I’ve been learning quite a few things along the way as well, so this is for mutual benefit.

I’m going to start this off extrtremely simple: A very small URL shortener that uses the NoSQL DB Riak. I’ll be putting all of my source files on my Github account so that you can more easily try out the code. Here is the code for this project: https://github.com/proslacker/Simple-URL-Tinier.

I’ll be developing and running most of these projects on my Ubuntu 11 laptop, but Riak also has downloads for Mac OSX and others. I grabbed the i386.deb since I’m on 32bit Ubuntu.

riak start

Since Riak is known for it’s ease of use for ops people, and it’s fault tolerance, you should just be able to set it and forget it once it’s running for our purposes anyway.

  • Next, you should grab one of the PHP libraries for accessing methods on your Riak server. Your Riak server is web accessible through a JSON api, which is what most libraries you’ll find utilize. Here is the one that I used since it is a single file and very straight forward: https://github.com/basho/riak-php-client. All you really need from there is the riak.php
  • Now to start CODING! The whole purpose of this app is to simply take any URL as input, and output a shortened URL, much like http://tinyurl.com/. First we’ll need some sort of mechanism to generate a small token that is a few characters long, but is unique enough to have millions of possible values so we can avoid collisions in our tokens. I considered creating my own mechanism, or even just simply taking the first few characters of an md5 or something like that, but why reinvent the wheel – I found a small libaray here: http://blog.kevburnsjr.com/php-unique-hash which does the trick quite nicely.
  • Create a file called utilities.php in your project webroot. (Also, you should have placed riak.php into your project webroot as well). Paste the following code from the above link:
<?php
class PseudoCrypt {
/* Next prime greater than 62 ^ n / 1.618033988749894848 */
private static $golden_primes = array(
1,41,2377,147299,9132313,566201239,35104476161,2176477521929
);

/* Ascii :                    0  9,         A  Z,         a  z     */
/* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
private static $chars = array(
0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
59=>120,60=>121,61=>122
);

public static function base62($int) {
$key = "";
while($int > 0) {
$mod = $int-(floor($int/62)*62);
$key .= chr(self::$chars[$mod]);
$int = floor($int/62);
}
return strrev($key);
}

public static function udihash($num, $len = 5) {
$ceil = pow(62, $len);
$prime = self::$golden_primes[$len];
$dec = ($num * $prime)-floor($num * $prime/$ceil)*$ceil;
$hash = self::base62($dec);
return str_pad($hash, $len, "0", STR_PAD_LEFT);
}
}

I won’t go into what exactly is going on in that class as I didn’t write it, but you may visit the link above if you want to know more about it.

  • Next we’ll create our index.php which is where all the magic will happen. The next few steps will be code followed by explanation.
<?php
require_once('riak.php');
require_once("utilities.php");

//Config
$riakHost = "127.0.0.1";
$riakPort = 8098;
$urlBucket = 'urls';
$baseUrl = "http://localhost/";

First we included the utilities.php that we created, and also the php client library (riak.php). Next is the config values needed to connect to Riak, as well as the baseUrl which is specific to this project.

//Inputs
$newUrl = isset($_POST['new_url']) ? $_POST['new_url'] : false;
$key = isset($_GET['k']) ? $_GET['k'] : false;
$shortUrl = false;

Here we’re just grabbing some variables from the request. If the script is being posted to, it means we are receiving a new URL to store into our database. If we get a value named “k” we are receiving a key which maps to a previously stored URL, and we should redirect to that value after we grab it!

//Processing starts here
if(newUrl || $key) {
$client = new RiakClient($riakHost, $riakPort);
$bucket = $client->bucket($urlBucket);

Okay firstly, if we receive either of the 2 parameters defined earlier, we need to process further, otherwise, no further PHP is required which is why we’re using the root if there. Next we connect to Riak using the host and port specified above. The default host and port should work if you are just running riak locally like the directions specified. Once our client is defined, we can specify a “bucket” to use, ours is called urls. A bucket is sort of like a table in relational databases, but you don’t need to stick to a specific schema when adding records to it (although for your own sanity, I suggest it).

//Save new url
if($newUrl) {
$hash = PseudoCrypt::udihash(str_replace('.','',microtime(true)));
$bucket->newObject($hash, array('url' => $newUrl))->store();
$shortUrl = $baseUrl . "?k=" . $hash;
}

Once we have Riak initialized, we can decide what to do next. If a new URL was passed into the script for storage, we first need to generate a hash for it. I’m using the PsuedoCrypt class static method udihash which takes an integer as the parameter. To make sure we’re turning out a random(ish) token, I’m passing microtime as the param.

Now we can save the URL to our Riak bucket using the newObject method followed by a chained store() call since the first method returns an instance of the bucket object. The newObject method is getting passed the identifier (our generated hash) followed by the value which is a simple key-value array which contains the new URL we got from a form. Finally we generate the shortened URL to pass back to the client.

//Retrieve saved url
if($key) {
$obj = $bucket->get($key);
header('Location: ' . $obj->data['url']);
}
}
?>

Lastly, if the “k” parameter was specified on the query string we need to look up the key in our Riak DB using the get method on the bucket object. Once we have the url that we previously stored, redirect the client using the header php method.

The only thing left at this point is to create a simple HTML form and some logic to display the newly generated short url.

<html>
<head>
<title>Simple URL Tinier</title>
</head>
<body>
<form method="post">
URL to shorten: <input type="text" name="new_url" /> (e.g. http://www.reddit.com - please include http://)
</form>
<br />
<?php if($shortUrl): ?>
Stored new url at <a href="<?php echo $shortUrl; ?>"><?php echo $shortUrl; ?></a>
<?php endif; ?>
</body>
</html>
  • That concludes this extremely simple example of how to create a URL shortener using PHP and Riak! I hope this is obvious, but this implementation contains several shortcomings which should prevent it from being used in any sort of production environment as is. These limitations include ZERO data cleansing which means it would be very easy to put bad data into the DB, poor user interface design like the fact that the urls must include http:// for them to work, and I’m sure there are other problems. This is simply meant to illustrate how to use Riak in it’s simplest form.
“So you don’t know there are 2 in there?”

My wife Gina sums it up better than I can, so here is a blatant copy / paste of her FB post We had our first ultrasound at 20 weeks yesterday morning and do we have big news. We walk into the ultrasound room, I lie down on the table and the technician goes to [...]

We’re Moving to Seattle!

It’s official! As some of you may know, Gina and I were asked by our current employer (Lockerz.com) to move out to Seattle, WA. We have decided to do it. We’re very excited to have this opportunity for a new adventure as well as a huge step in both of our careers. We’ll miss the [...]

Free Site Traffic Analytics

Ever wanted to see how popular your site is? Use google analytics. Ever wanted to get detailed information about a competitor’s site and nice traffic graphs for your site? Use http://www.websitetrafficspy.com/ The site will also give you a detailed breakdown of traffic by country, city, and it gives detailed information about the server / load [...]

Programmatically Generated HTML Layouts

I found this CSS system called 960 grid system. It’s grid layout allows you to have 12, 16, or 24 columns on your page. you can arrange these columns so that some elements are fluidly spread across multiple columns. The implications of having such a predictable geometric layout based on rules like this are that [...]