31.12.08

Memoization

From Wikipedia:

Memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.

This is not the same as caching; caching is a more general term. This specifically means caching the results of function calls.

I've built a small Memoization library. In this series of articles, we will explore its usage of a bit of code acrobatics, several ways to do each thing, and one really cool feature - the memoizer itself is memoized.

Today, I would like to introduce the interface design.
First, I wanted to keep this simple. There is only one public static overloaded method, named Memoizer.GetMemoizer().

This method is Generic, and overloaded by number of input parameters. Currently, there are up to three supported parameters for a function, but this can easily be increased.
The overloads are:

   1: public static class Memoizer
   2: {
   3:     public static Func<TResult> GetMemoizer<TResult>(Func<TResult> func)
   4:     public static Func<T, TResult> GetMemoizer<T, TResult>(Func<T, TResult> func)
   5:     public static Func<T1, T2, TResult> GetMemoizer<T1, T2, TResult>(Func<T1, T2, TResult> func)
   6:     public static Func<T1, T2, T3, TResult> GetMemoizer<T1, T2, T3, TResult>(Func<T1, T2, T3, TResult> func)
   7: }

Let's start with the first overload. The parameter is a function that accepts no arguments, and returns some TResult.

For example, say I need pi to a million decimal places. I'll keep it as a string to keep things simple.

   1: public static string GetPi()
   2: {
   3:     // This is probably the fastest way to calculate pi to a million places, as long as we have an Internet connection
   4:     byte[] data;
   5:     using (System.Net.WebClient client = new System.Net.WebClient())
   6:         // download the value from http://www.piday.org/
   7:         data = client.DownloadData("http://www.piday.org/includes/pi_to_1million_digits_v2.html");
   8:  
   9:     // convert the data to characters
  10:     char[] chars = System.Text.Encoding.UTF8.GetChars(data);
  11:  
  12:     // we skip all characters until the first '3' that marks the beginning of the number
  13:     // then, we filter only digits and the decimal period.
  14:     IEnumerable<char> filtered = chars.SkipWhile(c => c != '3').Where(c => (c >= '0' && c <= '9') || c == '.');
  15:  
  16:     // make a string
  17:     return new string(filtered.ToArray());
  18: }

This is a simple method that returns the value for pi to a million places. This is, in fact, a terrible function, but it will serve its purpose. Lucky it's not in production code!

When we run code that uses this function, we have a noticeable delay while the just-over-1MB file is downloaded from www.piday.org. Suppose we call it multiple times? That's why we memoize the result.

   1: public static readonly Func<string> GetPi = Memoizer.GetMemoizer(GetPiInternal);
   2:  
   3: private static string GetPiInternal()
   4: {
   5:     // This is probably the fastest way to calculate pi to a million places, as long as we have an Internet connection
   6:     byte[] data;
   7:     using (System.Net.WebClient client = new System.Net.WebClient())
   8:         // download the value from http://www.piday.org/
   9:         data = client.DownloadData("http://www.piday.org/includes/pi_to_1million_digits_v2.html");
  10:  
  11:     // convert the data to characters
  12:     char[] chars = System.Text.Encoding.UTF8.GetChars(data);
  13:  
  14:     // we skip all characters until the first '3' that marks the beginning of the number
  15:     // then, we filter only digits and the decimal period.
  16:     IEnumerable<char> filtered = chars.SkipWhile(c => c != '3').Where(c => (c >= '0' && c <= '9') || c == '.');
  17:  
  18:     // make a string
  19:     return new string(filtered.ToArray());
  20: }

Now, the first time we call GetPi, it will download the contents. The second time, it will simply return the same precalculated string. The upside is that calling the function again will be quick. The downside - This 1000002 character string is kept in memory as long as the Func<string> object is.

Next time, I'll quickly go through the other overloads, and introduce my first test cases.

2 comments:

  1. Looking forward to the rest of the articles in the series!

    ReplyDelete
  2. Coincidentally, I'll be blogging a bit about memoizers that are smarter about letting the GC clean up their state sometime in the next few weeks (I hope -- I need to find time to write the code!). Looking forward to seeing what you've got to say on that topic.

    ReplyDelete