A tip for the impatient: Simple caching with Python pickle and decorators

During testing and development, it is sometimes necessary to rerun tasks that take quite a long time. One option is to drink coffee in the mean time, the other is to use caching, i.e. save once calculated results to disk and load them from there again when necessary. The Python module pickle is perfect for caching, since it allows to store and read whole Python objects with two simple functions. I already showed in another article that it’s very useful to store a fully trained POS tagger and load it again directly from disk without needing to retrain it, which saves a lot of time.

Decorators

This approach can be generalized and used in combination with Python decorators. Decorators are perfect for extending existing functions by using a wrapper function:

def reverse_str(fn):
    def wrapper(*args, **kwargs):
        res = fn(*args, **kwargs)
        return res[::-1]
    return wrapper

@reverse_str
def hello_message(name):
    return 'hello %s!' % name

By decorating the function hello_message() with @reverse_str, this function will be passed as fn to reverse_str, which again constructs a wrapper function to which all kinds of parameters (*args, **kwargs) can be passed. The wrapper function is returned as a callable by reverse_str. Inside the wrapper function any function (fn) that is passed to it (i.e. any function that is decorated by it) can be called and its return value is reverted:

>>>> hello_message('world')
'!dlrow olleh'

For details on this topic, check out Abu Ashraf Masnun’s excellent introduction to decorators.

Very simple caching

For our aim, the basic idea is that we have an original function fn for which we want to enable caching. At first we will check if a caching file exists for it. If so, the result object that is stored in the cache file will be loaded and returned, so that fn doesn’t have to be called. If not, fn will be called but the result will additionally be stored to the cache file so that it can be loaded from there the next time.

To do this, we need to define a function that accepts the name of the cache file as an argument and then constructs the actual decorator with this cache file argument and returns it. Think of this function as a “factory function” that produces individual decorators that use a specific cache file:

import os
import pickle

def cached(cachefile):
    """
    A function that creates a decorator which will use "cachefile" for caching the results of the decorated function "fn".
    """
    def decorator(fn):  # define a decorator for a function "fn"
        def wrapped(*args, **kwargs):   # define a wrapper that will finally call "fn" with all arguments            
            # if cache exists -> load it and return its content
            if os.path.exists(cachefile):
                    with open(cachefile, 'rb') as cachehandle:
                        print("using cached result from '%s'" % cachefile)
                        return pickle.load(cachehandle)

            # execute the function with all arguments passed
            res = fn(*args, **kwargs)

            # write to cache file
            with open(cachefile, 'wb') as cachehandle:
                print("saving result to cache '%s'" % cachefile)
                pickle.dump(res, cachehandle)

            return res

        return wrapped

    return decorator   # return this "customized" decorator that uses "cachefile"

To test our decorator, let’s define a small function that is made artificially slow by calling sleep() in order to delay the execution for 3 seconds:

from time import sleep

def my_slow_function():
    sleep(3)
    return 'some result'

print(my_slow_function())

Executing this always results in a delay of 3 seconds until the result is printed. Let’s decorate this function with @cached in order to benefit from our simple caching mechanism:

@cached('slow_function_cache.pickle')
def my_slow_function():
    sleep(3)
    return 'some result'

Now when we execute this function, it will again take 3 seconds (because there’s no cached result, yet) but additionally, we’ll see the output “saving result to cache ‘slow_function_cache.pickle'”. Now when we repeat calling my_slow_function(), the result appears almost immediately because it is loaded from cache and returned directly (as the message “using cached result from ‘slow_function_cache.pickle'” implies).

Don’t trick yourself

We can use this decorator now for different functions and just pass a different cache file as argument. But please keep in mind that this caching implementation is very very simple and used only for demonstration here. The problem with this implementation is, that it will always return the same result from the first time you called the function, even when your function has arguments and therefore should produce different results for different arguments. It’s easy to trick yourself with this kind of caching and you end up wondering why you always get the same results from a function although you use different arguments. A possible solution would be to implement a more sophisticated caching mechanism, which makes sure that a result is cached depending on the passed arguments (args and kwargs variables). This could by done by creating a unique hash for the argument variables and storing it in a dictionary which has a argument hash to cached result mapping.

Comments are closed.

Post Navigation