The Factory Method Pattern¶

A “Creational Pattern” from the Gang of Four book

Verdict

The “Factory Method” pattern is a poor fit for Python. It was designed for underpowered programming languages where classes and functions can’t be passed as parameters or stored as attributes. In those languages, the Factory Method serves as an awkward but necessary escape route. But it’s not a good design for Python applications.

You will often create objects in Python that themselves need to turn around and create more objects. Imagine an HTTP connection pool, for example, that sometimes needs to create new connections to replace old connections as they time out and close.

Now comes the challenge. What if your program will run behind a special corporate proxy, and the HTTP connection pool will only be able to communicate if it creates and uses specially configured network connections instead of the generic sockets that it usually creates?

The Factory Method is one pattern by which the HTTP connection pool class could offer you a way to choose what kind of connection it creates. Before learning how this awkward pattern works, let’s review the Pythonic design patterns that solve the same problem.

Dodge: use Dependency Injection¶

First, we should stop for a moment and double-check something. Does the class you’re designing really need to go around creating other classes?

If you know up front all the objects that the class will need, you should consider providing them to the class yourself through Dependency Injection.

You will see excellent examples of this simpler pattern everywhere that Python libraries let you pass an already-open file object instead of making you supply a path and insisting on opening the file themselves. Take the Standard Library’s JSON parser as an example. Instead of asking for a path, its load() method lets you open the file:

import json
with open('input_data.json') as f:
    data = json.load(f)

By leaving you in charge of instantiating the file object, this maneuver accomplishes many goals at once.

Decoupling: the library doesn’t need to know all the parameters accepted by the open() method, and doesn’t need to accept every one of them as a parameter itself. If the file object were to grow more creation parameters in the future, json.load() won’t need to change.
Efficiency: If you already have the file open for other reasons, you can go ahead and provide the open file object. The library won’t insist on re-opening the file. This is crucial if the file is a read-once source of data like an anonymous pipe.
Flexibility: You can pass any file-like object you want. It can be a subclass of the standard file object, or be completely independent. You could pass a StringIO that operates directly on data in RAM instead of needing the JSON written to disk first. Or you could offer a wrapper around a socket that lets you parse JSON data as it arrives off the network without needing to be stored locally first.

In a simple case like this, the Dependency Injection pattern completely eliminates the need for an object like a JSON parser to have its design made more complicated by the details of how other objects get created.

So (a) if your class always needs a particular object built, and (b) if users will normally want to at least customize the parameters with which it is instantiated, you should strongly consider Dependency Injection. In that case, you might want to read Miško Hevery’s 2008 post about dependency injection for an important note about the dangers of applying Dependency Injection halfway!

Instead: use a Class Attribute Factory¶

The next alternative comes to us from the 1990s. The class that needs to be created can be attached as an attribute on the class that will be doing the creating.

You can see this pattern in some of the older modules in the Python Standard Library. For example, when an HTTPConnection is done sending a request, it expects to receive and parse a response. But what class should it use to parse and represent the response? What if you happen to be talking to a server that sends non-standard information in its response for which you require special handling?

What I will here call the Class Attribute Factory pattern takes advantage of the fact that classes are first-class objects in Python. It simply attaches the class as an attribute of the class that will be doing the creating. You’ll find the following code in the Standard Library:

class HTTPConnection:
    ...
    response_class = HTTPResponse
    ...
    def getresponse(self):
        ...
        response = self.response_class(self.sock, method=self._method)
        ...

This gives the code using the HTTPConnection complete control over what happens when it is time to build a response. All it has to do is create a subclass and use the subclass instead:

class SpecialHTTPConnection(HTTPConnection):
    response_class = SpecialHTTPResponse

That’s all there is to it! While subclassing generally means that a software design is running at least somewhat against the grain of the Python language, the approach will still work well. The subclass will behave exactly like a normal Standard Library HTTP connection except when the moment comes to instantiate a response. At that point the connection will access its response_class attribute, receive the alternative class you provided instead of the normal one, and the alternative response class will then be in control.

There is more flexibility here we should explore, but first let’s look at the more modern alternative to the Class Attribute Factory.

Instead: use an Instance Attribute Factory¶

Why should you have to subclass an object merely to customize its behavior?

It’s a serious question — very serious, for at one point in the history of programming some adherents of a doctrine called “object orientation” were suggesting that if you wanted a button that said “Submit” that you shouldn’t be able to simply instantiate a button with a parameter like label="Submit" but that you should instead have to subclass the button and override its default label() method to return something new.

Happily, an alternative to subclass-powered customization has swept the Python community: what I’ll call the Instance Attribute Factory. Among the many good examples of its practice is the json module in the Standard Library, which was added around 2008.

Here’s one example from the json module. Every time the JSON module encounters a number in its input, it has to instantiate some kind of Python object capable of representing the number. But which number class should it instantiate? An integer, if the fractional part of the number is zero? A float, the only numeric type in JavaScript? Or a Decimal that’s guaranteed to not lose any precision?

Watch how elegantly the json module handles the question:

class JSONDecoder(object):
    ...
    def __init__(self, ... parse_float=None, ...):
        ...
        self.parse_float = parse_float or float
        ...

Whenever it encounters a number in its input, it simply calls self.parse_float() with the string as input.

This is Python code that’s running on all pistons. If the developer does not intervene, each number is interpreted using a lighting-fast call to the float type itself. If instead the developer has provided their own callable for parsing numbers, then that callable is transparently used instead.

The beauty is that it all happens without a single additional class! Instead of forcing the programmer to create a new class each time they want to customize behavior, individual JSONDecoder instances can each be configured directly. You can create a custom decoder with a single line of code:

from decimal import Decimal
from json import JSONDecoder

my_decoder = JSONDecoder(parse_float=Decimal)

Besides the benefits of clarity and brevity, an advantage of customizing an object through its parameters is that parameters compose so beautifully in Python. If several pieces of code have parameters for the decoder that they need to combine, the task is no more difficult than building an empty dict and then using update() to fill it with each set of parameters, setting the parameters last that should be allowed to override the earlier ones.

Instance attributes override class attributes¶

I should admit that the previous two design patterns are not as completely different as I have tried to make it sound. At bottom, both classes — the HTTPConnection and the JSONDecoder — make the exact same move when they are ready to create a new object: they start with self and use . to access some specific attribute on it. The only difference in the two design patterns above is in how they choose to supply the attribute. The first pattern happens to use a class attribute, while the second uses an instance attribute.

But the two are not mutually exclusive. There is no rule that if you have a class attribute named .response_class that you can’t also have an instance attribute named .response_class — and the rule in the case where you have both is simple: the instance attribute wins.

Which means I should admit that, really, even though I made a big deal about claiming that the HTTPConnection forces you to subclass it, it’s not true. You can override the default but just setting an instance attribute instead, just like the JSONDecoder does! The only difference is that the HTTPConnection won’t give you any help — you’ll have to reach in and set the instance attribute yourself:

conn = HTTPConnection()
conn.response_class = SpecialHTTPResponse

So even when an old-fashioned class looks like it wants you to create a subclass that specifies a new value for one of its class attributes, you can often use the more modern Instance Attribute Factory instead!

There are tiny differences in semantics and performance between class attributes and instance attributes, but I’ll refer you to the Python documentation and to Stack Overflow if you think your code is wandering towards an edge case where you care about the difference.

In general you should choose between the above patterns based on readability. If you can imagine developers ever wanting to customize object creation, then go ahead and try making the object creation routine (the “factory”) a parameter in your __init__() method and store it as an instance attribute. If instead you think that customization will be extremely rare, then make it a class attribute, remembering that the developer can always reach in and override it in those rare cases where they need to.

Any callables accepted¶

In the examples above, we used actual classes like Decimal and the fictional SpecialHTTPResponse when setting an attribute like response_class or parse_float. But you’ll note that the only thing the callers cared about was that these classes were callables. There’s happily no new keyword in Python, so object instantiation looks exactly like a normal function or method call.

This means that you can substitute a function for any of these callbacks, and it will work just as well! For example, you could provide the JSON decoder with a function like this as its parse_float parameter:

def parse_number(string):
    if '.' in string:
        return Decimal(string)
    return int(string)

You can provide not only a function, but any other kind of callable as well — maybe a bound method, or a class method like an alternative constructor. You could even provide a callable that you’ve spun up dynamically using functional programming techniques like partial application:

from decimal import Context, ROUND_DOWN
from functools import partial

parse_number = partial(Decimal, context=Context(2, ROUND_DOWN))

Feel free to enjoy this Pythonic freedom of providing any kind of callable, and not limiting yourself to just providing classes, whether you are using the Class Attribute Factory or an Instance Attribute Factory.

Implementing¶

Having described the happy alternatives, I should finish by showing you the Factory Method itself. Imagine that you were using a language where:

Classes are not first-class objects. You are not allowed to leave a class sitting around as an attribute of either a class instance or of another class itself.
Functions are not first-class objects. You’re not allowed to save a function as an instance of a class or class instance.
No other kind of callable exists that can be dynamically specified and attached to an object at runtime.

Under such dire constraints, you would turn to subclassing as a natural way to attach verbs — new actions — to existing classes, and you might use method overriding as a basic means of customizing behavior. And if on one of your classes you designed a special method whose only purpose was to isolate the act of creating new objects, then you’d be using the Factory Method pattern.

The Factory Method pattern can often be observed anywhere that code from an underpowered but object-oriented language has been translated straight into Python. The logging module from the Standard Library comes immediately to mind. Here’s an excerpt:

class Handler(Filterer):
    ...
    def __init__(self, level=NOTSET):
        ...
        self.createLock()
    ...
    def createLock(self):
        """
        Acquire a thread lock for serializing access to the underlying I/O.
        """
        self.lock = threading.RLock()
    ...

What if you wanted to create a Handler that uses a special kind of lock? The intention here is that you subclass Hander and override createLock() to return your own favorite kind of lock instead. It’s a clunky approach, it takes several lines of code, and it won’t compose well if there are several ways you want to customize Handler objects in a variety of situations — you’ll wind up with classes all over the place.

But it will work.

It just won’t be very Pythonic.