What’s in a constructor?

Beginners in object-oriented programming are taught what constructors are, as they are impossible to avoid using and creating in practice. But even experienced programmers sometimes design bad constructors that forget the concept’s fundamental purpose and make using their classes difficult. In this post, I discuss some of these basic tenets and show what problems they solve. I use C# syntax for the examples, but everything said here applies equally to C++ and Java too.

The essential promise

Everyone knows a constructor runs when a new object is created. But what is a constructor actually supposed to do? Constructors make an object valid and ready for use. That is the basic contract that a constructor must fulfill. Now let’s explore some consequences of this.

When a constructor returns, the object is initialized

Here is an example of a bad class:

class Service {
    private ConnectionInfo ci;
	
    public Service() {
        ci = null;
    }

    public void SetConnectionInfo(ConnectionInfo ci) {
        this.ci = ci;
    }

    public void Start() {
        var connection = ci.Connect();
        // other code ...
    }

    // other members ...
}

This class is bad because its constructor does not leave the instance in a valid and ready-to-use state. Everyone who calls Start on an instance risks crashing (or worse) unless he knows someone has called SetConnectionInfo first. This problem doesn’t seem so bad if we consider a program that only constructs Services in one function and immediately calls SetConnectionInfo on the instance it just created. But for anything beyond that, ensuring a Service has been initialized before use becomes a difficult or impossible task. Consider a function that accepts one or more Service instances as input.

void StartServices(IList<Service> services) {
    foreach (var service in services) {
        // Risk of crash or UB here due to uninitialized service
        service.Start();
    }
}

In general, code that uses object references assumes those objects are in a valid state. An exception to this is code that has the responsibility of deciding whether an object is valid, but that can’t apply here because there is no way for a user of a Service object to check whether it has a valid ConnectionInfo.

public Service(ConnectionInfo ci) {
    if (ci == null) {
        throw new ArgumentNullException(nameof(ci));
    }
    this.ci = ci;
}

This is an acceptable constructor because it simplifies the use of the class and makes it much less error-prone.

Aside: What if that calls for my constructor having a ton of parameters?

Consider a class whose constructor needs many parameters. This could be for any reason, though the simplest is that the constructor needs to assign values into many fields.

class Operation {
    private User user;
    private Reason reason;
    private Entity origin;
    private Entity destination;
    private string name;
    private string description;
    private bool retryOnFailure;
    private DateTime deadline;
    private Priority priority;
    private Action<Operation, Result> completionCallback;

    public Operation(User user, Reason reason, Entity origin, ...

So far, there’s no problem here. But calling this constructor can sometimes be awkward. Consider a function that computes values in local variables before passing them as arguments to the constructor call. That’s a very reasonable thing to do, but it results in that function’s scope being cluttered with many long-lived local variables. How could we clean that up?

Use the builder pattern

One way is to use the builder pattern. That is, instead of keeping a mess of temporary variables around until you’re finally ready to pass them all to the constructor, delegate the storage of those constructor parameters to an object made for the purpose.

class OperationBuilder {
        // These fields would be initialized to null or to some default value as applicable
	public User User { get; set; }
	public Reason Reason { get; set; }
	public Entity Origin { get; set; }
	public Entity Destination { get; set; }
	public string Name { get; set; }
	public string Description { get; set; }
	public bool RetryOnFailure { get; set; }
	public DateTime Deadline { get; set; }
	public Priority Priority { get; set; }
	public Action<Operation, Result> CompletionCallback { get; set; }

	public Operation Build() {
		return new Operation(User, Reason, Origin, ...);
	}
}

Builders can look different from this, but what matters here is the concept that the builder holds all the constructor arguments of the not-yet-constructed object. Using a builder simplifies functions that have to compute values to pass to the constructor. Instead of having to keep each argument in its own temporary or pass large expressions directly into the constructor call, we can now just declare a single builder and assign each argument to the appropriate field of the builder. We could even pass around the builder if we wanted, allowing different parts of the program to populate different parts of the builder, or perhaps deferring the actual constructor call until long after the arguments are prepared. Although it’s tedious to write the definition for a builder class, I find that it can pay off even if you only use it a little bit.

Taking a step back, the builder pattern can be seen as a way to get the convenience of initializing objects piecewise without the bugs that usually come with doing so. This is because the partial initializability is sequestered in OperationBuilder, while Operation itself is always complete and never appears “under construction”…but see the next section!

Before a constructor returns, the object is not initialized

Another consequence of the basic idea that constructors make objects ready for use is that before a constructor returns, code must not assume the object is usable.

class User {
    public string Name { get; }
    public bool IsRestricted { get; }
    public User(Group group, string name, bool isRestricted) {
            group.AddUser(this);
            Name = name;
            IsRestricted = isRestricted;
    }
    // other members ...
}

In the above example, the constructor leaks the this pointer to outside code before the constructor has completed. Code inside or transitively called by Group.AddUser will assume the given User instance is valid and ready for use, and it will crash or do something crazy when this assumption is violated. For example, AddUser could look like this:

void AddUser(User user) {
    if (!users.TryAdd(key: user.Name, value: user))
        throw new ArgumentException("User's name must be unique");
    if (!user.IsRestricted)
        administrators.Add(user);
}

That function is reasonable, but since it is passed an object whose constructor hasn’t returned yet, the opportunity for bugs is endless. In this case, we will get a crash in TryAdd, since user.Name will be null.

Furthermore, if not for TryAdd throwing an exception, the following statement would add the user to the administrators list regardless of whether the user is supposed to be restricted.[1]

It may not even be possible to discover the full extent of the bugs that can be caused by using an uninitialized object in a given program. References to such objects may be passed to opaque observers or stored in random places on the heap, to be accessed at indeterminate times afterward and in ways that may or may not trigger erroneous behavior. Merely storing a pointer somewhere is usually fine, but you must always avoid using the memory at the pointer in any way (reading or writing the object’s fields, or calling its nonstatic functions, or passing the pointer to any other code that will do so).

A subtler version of the same problem

Another opportunity to leak the this pointer from a constructor occurs when calling a base class constructor. Consider this example:

class Species {
    protected string speciesName;
    protected Species(string name) {
        speciesName = name;
    }

    public override string ToString() {
        return speciesName;
    }
}

class Cat : Species {
    private string name;
    public Cat(string name) : base("cat") {
        this.name = name;
    }
    
    public override string ToString() {
        return $"a {base.ToString()} named {name}";
    }
}

If we use the above code like this:

var tony = new Cat("Tony");
Console.WriteLine($"Tony is {tony.ToString()}");

We get the expected output:

Tony is a cat named Tony

Now what if we add a little logging in the Species constructor? It doesn’t have to be logging, but it’s easy for this example.

protected Species(string name) {
    speciesName = name;
    Console.WriteLine($"Created a species \"{ToString()}\"");
}

If we run the program now, we get

Created a species "a cat named "
Tony is a cat named Tony

What happened is that the ToString() call in the Species constructor executed the Cat version of the method, and it did so in a buggy way at that, as the derived class’s name field was not initialized. This shows that the necessity of avoiding using objects before their constructors are complete extends to base class constructors. That is, base constructors must avoid calling virtual functions on this. It is a good rule of thumb that unless you know your class will never be derived from (it is sealed/final), you should avoid calling virtual functions on this inside constructors.

Conclusion

This was a long post with quite a few rabbit holes, but we don’t have to commit every detail to memory. Quite the contrary, all we need to do is remember that constructors make an object valid and ready to use, and everything else in this post follows as a consequence of that guiding axiom.


[1] In C# and Java, it would always add the user, since bools are always implicitly initialized to false. In C++, whether it would do so is undefined, since it would read uninitialized memory.


Posted

in

by

Tags: