Self-Registration Factory Pattern in C++
Usage Scenarios
Implementing a typical factory pattern in C++ is not complicated, take the common Shape class as an example:
1 | // ShapeFactory.cpp |
This method is pretty straightforward and is used more, but there are two disadvantages:
- Each concrete class implementation must be manually registered in the
ShapeFactory.cpp
. Over time, this file will be longer and there will be too many if-else branches eventually; - It is not easy to do isolation of function macros, and there will be nesting of preprocessor commands with poor readability after adding platform macros.
You can dynamically generate list files such as codec_list .cin FFmpeg to solve the second problem. As a cross-platform library, FFmpeg faces the very same problem of more complex functional options. Its solution is to dynamically generate this list file for as long as possible during the configuration time. Only the enabled codec will appear in the list, which looks relatively clean.
This article will introduce another way to solve the above two problems: the self-registered factory pattern.
Implementation and Principles
Self-registration exploits the global static variable or class static member automatic initialization mechanism. In the constructor of a static variable, it will register itself into the factory method implicitly. C programming language can achieve a similar effect through __attribute__ ((constructor))
.
For example, the Shape example above can be written as:
registry.h
1 | template <typename ShapeType> |
circle.cpp
1 | class Circle: public Shape { |
Circle::m_registered
is only used to collect the return value of the registered function and ensure that it is called. The registered function call occurs before entering the main function. After entering the main function, you can directly use the factory class to create an instance.
main.cpp
1 | int main() { |
The advantage of this is obvious, since the registration of the creator takes place in the Circle source file. If we decide to disable the Circle class through the feature option, just rule out this file in the build file, no need to modify the code elsewhere, and no need to set up some functional macros to isolate the code. We don’t even need to declare Circle in a header file since there is no direct reference to this class anywhere else!
This method is also highly suitable for dynamic loading plug-ins during runtime. Suppose the Circle class is designed as a plug-in, the host program only needs to invoke dlopen
function to load the Circle.so
, and it will register itself into the list without any additional need for read information methods.
The sample code above can be found in different branches here:
But this approach also inherently has some problems that are not easy to ignore. The following will introduce its shortcomings and some compromise methods.
Problems in Practice
Symbol Stripping of Static Linking
This is the most direct problem with the self-registration method, as the cost of implementing the code unitization above. For a static library, the linker only copies the object files directly used by the program during the linking phase. Therefore, the above Circle.o
object file will be removed without notifying because the Circle class has no code reference other than itself!
When we have to use static linking without giving up the self-registration method, we can:
Use some compile options to force the link target to depend on self-registered symbols, in clang / gcc , this option is
-u
, see Clang command line argument reference - Clang 17.0.0git documentation . In the above example, add an INTERFACE link option to the shape class in CMake :1
target_link_options(shape INTERFACE -u__ZN6Circle12m_registeredE)
After that, any target links
libshape.a
library will default rely onCircle:: m_registerd
, thereby forcing the linker to useCircle.o
object file.Directly make the self-registered source file participate in the compilation of the upper-level target, that is, not compress
Circle.o
into the libshape.a library, but directly link the object file to the upper-level target as an additional subsidiary of the library. The implementation of CMake is:1
target_sources(shape INTERFACE ${CMAKE_CURRENT_LIST_DIR}/shape/impl/Circle.cpp)
In this way, all the targets that link the shape library will directly use
Circle.cpp
as their own source files so that the linker will not directly delete a target file such asCircle.o
.
The above two methods are highly dependent on the compilation and CMake system. If you need cross-platform you need a more general method which I have not found yet.
The Construction Order of Static Variables
Because the factory uses global variables to register, you must be careful not to rely on their order, and the relevant operations should be carried out after entering the main function as much as possible, for example:
- The map that holds the creator in the Factory must be a static variable in the function scope instead of the global scope;
- Do not create or refer to this factory in the construction of other static variables.
Who Are Using This
Clang’s plugins and various modules of clang-tidy in Project LLVM use this technique:
Clang plugin, see Clang Plugins - Clang 17.0.0git documentation , dynamically loaded by dlopen at runtime, after loading, the actions in the plugin are automatically registered into the list, and there is no problem with symbol elimination.
Clang-tidy module registration. A new module is registered by initializing a static variable. to solve the above problem, clang-tidy declares an
extern volatile int
variable corresponding to each module in a public header file, and the definition of the variable instance is distributed in the module’s source file. Because of the reference of this int variable, the corresponding module will not be removed by default.