Skip to content

Your data model is not your software architecture

One of the reasons why I enjoy working at Audiokinetic is because the code base of typical game middleware has two important requirements:

  • Code structure must allow for long-term continuous maintenance. It’s not a simple question of “just make it good enough for shipping”.
  • There are real-time requirements to perform typical but complex tasks on a wide variety of platforms and CPU architectures. These use cases become more and more complex and demanding as time passes, and so the runtime performance must keep up with ever-increasing requirements.

This is interesting because these two requirements are often considered diametrically opposed. Long-term maintenance generally implies a good software design that focuses on reducing module coupling. In C++, this is generally achieved using the OOP principles of encapsulation and polymorphism. On the other hand, high-performance code is written with the underlying architecture in mind, and shortcuts like using globals to call directly into non-virtual functions of another module are commonplace. So where do you draw the line?

OOP design patterns have a bad rep in the C++ performance-oriented world because the dogmatic pursuit of object-orientation often leads to a software architecture that is difficult to optimize. Mike Acton demonstrates this very well in his seminal CppCon14 talk on data-oriented design (if you have not watched it yet, stop reading now and go watch it, it’s quite compelling). In fact, data-oriented design currently seems like the best way to go if you want to write high-performance code for modern platforms.

Is object-oriented design the antithesis of data-oriented design? I posit that the two are orthogonal and can be combined to great effect, but one must first avoid some classic pitfalls of object-orientation. The way OOP is taught in many universities, including the two I personally attended, is terribly misleading and encourages exactly the kind of design that should be avoided. After years of working at Audiokinetic, I still struggle to shake off these bad habits.

The key principle can be summarized as follows: your data model is not your software architecture. Repeat this mantra! Universities teach you to identify your data model, your problem space, and then model classes and objects from your model’s entities and relationships. This is bad! For example, if you are writing a DAW dealing with voices, busses, sources, effect plug-ins, it does not mean your code should have a Voice, Bus, Source, or Effect class. In fact, it most likely should not!

What should your OOP design be modelled on, if not your data model? It should be modelled on your data transformations. Remember that OOP design has two primary advantages:

  • Encapsulation, which helps reduce source code coupling between modules
  • Polymorphism, which helps reduce runtime coupling between modules

Encapsulation is easy: if you have a data transformation that requires maintaining some sort of state over time, that is a good candidate for a class. In the DAW example, you may have a pipeline that goes like this: Sources generate audio -> Audio is resampled -> Effects are applied -> Sources are mixed. Take one step in your pipeline, say the sample-rate conversion. This could be modelled at a ResamplerStep class. The significant difference here is that the ResamplerStep class is going to be designed to operate on a collection of sources instead of operating only on a single source, because sources themselves aren’t necessarily modelled as individual objects. ResamplerStep will encapsulate the book-keeping data required to resample multiple sources over time.

The source data itself will be stored in whatever form is most efficient to be resampled efficiently. This may mean storing source data as a structure of arrays rather than an array of structures.

What about polymorphism? Well, if your classes represent data transformations, then it’s entirely natural to have certain transformations being represented as generic interfaces, which are then implemented differently for different purposes. Going back to the DAW example, the ResamplerStep class could be an interface, which can then be implemented using a high-quality cubic interpolation in one subclass, and using a simple but fast linear interpolation in another. As described before, both implementation would operate on a collection of sources, allowing for optimizations based on the total input size for each implementation. Also, you’re not paying the cost of a virtual function call for each source, but rather every “pass” in your frame.

So, the next time you write a class that represents a single entity in your data model, stop and ask yourself: how many instances of this class am I going to allocate at any one time during execution of the program? If the answer is “more than a dozen”, maybe it’s time to follow the mantra: your data model is not your software architecture!