F# Url ShortCode Library

22/12/2014

Sure everyone have seen url short codes before, and it’s not really rocket science how to generate them. Regardless, I decided to write my own library for it because I had a specific requirement that I have not seen covered by any other url short code algorithm.

The two requirements for the url short code algorithm was:

  1. Short codes, doh, but in short I mean less than 10 characters
  2. The date the code was generated must be embedded in the code

The first requirement is pretty obvious, but it does rule guid based url short codes, like proposed by Mads Kristensen.

The second requirement is more unique. The reason why I want my short codes to include the date, is because I want to use the date portion of the code as a primary key in a Azure table storage. This makes sense from a performance perspective because I want partitions to be relative small when looking up values and it also makes house keeping easy as we can do batch deletes of all rows from a specific date. (Batch operations only work within the same partition.)

I wrote a library called SJKP.ShortCode that you can download as a nuget package, or from github. It’s written as a portable class library in F#.

Short codes generated by the library looks like this: AlwJy_uNAs

Note that the short codes generated are case-sensitive.

The first 3 symbols (Alw) are the Base64UrlSafe encoded date. Base64UrlSafe encoding simply means that ‘+’ is replaced by ‘-‘ and ‘/’ is replaced by ‘_’, and any ‘=’ characters are removed.

A date in format MM-dd-yyyy wouldn’t be representable in just 3 characters, so it’s not a full date, but only dates from year 0-99 are supported (the library assumes that 0 is 2000).

In order to represent the date by three base64UrlSafe encoded characters, the year, month and day portion of the date is packed into two bytes in the following way.

[–Year 7bits–|–Month 4bits–|–Day 5bits–]

Alw in this example represent the date 2001/2/28.

The characters following the first three date representing characters, are random bytes generated with the System.Random (not so great random generator). These bytes are also passed through the same Base64UrlSafe encoding, and concatenated with the date portion of the short code. The length of the random part is variable by using the NewShortCodeByDateAndLength(d : DateTime, len: int) method. When using NewShortCode() it is set to five.

Use my algorithm as you see fit, and feel free to improve it. Note that because I use System.Random with a HashCode from a random Guid as the seed, collisions on the random part of the code could happen. If you have any ideas on how to improve this part of the short code algorithm and still have it be PCL compatible, feel free to post a comment, or make a pull request.