C++ Compilation Model and Linkage
- Description: A note on how C++ programs are built — the compilation pipeline, translation units, the One Definition Rule, linkage, headers, the preprocessor, and static vs dynamic linking
- My Notion Note ID: K2A-B1-2
- Created: 2018-09-15
- Updated: 2026-02-28
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. The Compilation Pipeline
- 2. Translation Units
- 3. The One Definition Rule (ODR)
- 4. Linkage: Internal vs External
- 5. Header Files and Header Guards
- 6. The Preprocessor
- 7. Static vs Dynamic Linking
- 8. Name Mangling
1. The Compilation Pipeline
A C++ source file goes through four phases on its way to an executable:
- Preprocessing —
#include,#define, conditional compilation. Output: a single text "translation unit" with macros expanded and headers inlined. - Compilation — translation unit → object file (
.o/.obj). Each TU is compiled independently; the compiler does not see other TUs. - Assembly — assembly text → machine code (usually integrated with the compilation step).
- Linking — multiple object files + libraries → a single executable or shared library. Resolves cross-TU symbol references.
foo.cpp ──[preprocess]──► foo.i ──[compile]──► foo.o ──┐
bar.cpp ──[preprocess]──► bar.i ──[compile]──► bar.o ──┼─[link]─► program
stdlib.a ──┘
Each step in the pipeline matters: macro hygiene affects the preprocessor; inline and templates affect compilation; static/extern and library order affect linking.
2. Translation Units
A translation unit (TU) is the output of preprocessing one source file — the source plus all its #included headers, with macros expanded. It's the unit the compiler sees.
Key consequences:
- Each TU is compiled independently. The compiler doesn't know what's in other TUs.
- Headers included in many TUs are reparsed in each one. This is what makes large C++ builds slow (and what modules fix).
staticat namespace scope means "internal to this TU" — invisible to other TUs.- Templates instantiated in a TU live in that TU. That's why template definitions usually go in headers — so every TU that uses the template can instantiate it.
3. The One Definition Rule (ODR)
The ODR is the most important rule in C++ linking:
Every variable, function, class type, enumeration, and template must have exactly one definition across the entire program.
There are nuances:
- Declarations are unrestricted. You can declare a function in 100 headers; you can only define it once.
inlinefunctions,inlinevariables (C++17), class definitions, templates may be defined in multiple TUs, but every definition must be token-for-token identical. The linker picks one.- Anonymous namespaces and
staticdefinitions are TU-local, so they are not subject to the cross-TU "exactly one" rule.
Common ODR violations:
// foo.h
int x = 42; // BUG: definition in header
inline int safe_x = 42; // OK: 'inline' allows multi-TU definition
void f() { /* ... */ } // BUG: non-inline definition in header
inline void g() { /* ... */ } // OK
When the linker sees two non-inline definitions of f(), you get a "multiple definition" error. When it sees two different definitions of an inline function (e.g., one TU compiled with a different macro), the program has undefined behavior — the linker doesn't have to detect the divergence.
4. Linkage: Internal vs External
A name has linkage if it can refer to the same entity from a different scope. Three kinds:
| Linkage | Visibility | How to declare |
|---|---|---|
| No linkage | Block scope only (locals) | Default for local variables |
| Internal | One TU only | static at namespace scope, anonymous namespace, const namespace-scope variables (without extern) |
| External | All TUs in the program | Default for non-static namespace-scope names; extern is implicit; functions are external by default |
// translation unit 1
static int counter = 0; // internal — invisible to TU 2
namespace { void helper(); } // also internal (preferred modern style)
int g_count = 0; // external — TU 2 can extern-declare it
void g_init(); // external — declarations propagate via extern
The modern idiomatic way to give a name internal linkage is the unnamed namespace, not file-scope static:
namespace {
int helper_count = 0;
void helper() { /* ... */ }
}
Unnamed namespaces also work for types — static is a storage-class specifier that only applies to objects and functions, never to types.
5. Header Files and Header Guards
Headers contain declarations (and inline / template / class definitions) shared across TUs. Without protection, including a header twice causes redefinition errors.
Two equivalent ways to guard a header:
// header.h — using #pragma once (non-standard but widely supported)
#pragma once
void foo();
// header.h — using include guards (portable, standard)
#ifndef MYPROJECT_HEADER_H
#define MYPROJECT_HEADER_H
void foo();
#endif // MYPROJECT_HEADER_H
#pragma once is shorter and less error-prone (no risk of macro name collision). All major compilers support it. Use it unless you're targeting a really exotic toolchain.
What goes in headers
| Goes in headers | Stays in .cpp |
|---|---|
| Function declarations | Function definitions (non-inline) |
| Class definitions | Implementation details |
inline functions |
Static globals (file-scope state) |
| Templates | Anonymous namespace contents |
inline variables (C++17) |
Mutable globals |
constexpr functions and variables |
|
Type aliases (using, typedef) |
6. The Preprocessor
The preprocessor runs before the compiler proper and does pure text substitution. It has no concept of types or scope.
#include <header> // include another file
#include "local.h"
#define MAX 100 // macro: text substitution
#define SQUARE(x) ((x) * (x)) // function-like macro (note the parens!)
#ifdef DEBUG // conditional compilation
log("debug");
#elif defined(RELEASE)
log("release");
#else
log("unknown");
#endif
#if __cplusplus >= 202002L
// C++20 and later
#endif
#error "unsupported config" // compile-time error
#warning "deprecated" // compile-time warning (standardized in C++23; widely supported as an extension before)
#pragma once
Predefined macros
| Macro | Meaning |
|---|---|
__cplusplus |
C++ standard version (e.g. 202002L for C++20) |
__FILE__, __LINE__ |
Current file path and line |
__func__ |
Current function name (C99/C++11) |
__DATE__, __TIME__ |
Build timestamp |
_WIN32, __linux__, __APPLE__ |
Platform |
__GNUC__, _MSC_VER, __clang__ |
Compiler |
Macros are dangerous
- No type safety.
SQUARE(x++)evaluatesx++twice. - No scoping. A
#define MINin a header pollutes every TU that includes it. - Hard to debug. The debugger sees post-expansion text, not the macro name.
Modern C++ avoids macros for constants (constexpr), inline functions (inline), and generic functions (templates). Reserve macros for include guards, conditional compilation, and platform abstraction.
7. Static vs Dynamic Linking
When the linker bundles object files into an executable, libraries can be resolved two ways:
| Aspect | Static linking | Dynamic linking |
|---|---|---|
| File extension | .a (Unix), .lib (Windows) |
.so (Linux), .dylib (macOS), .dll (Windows) |
| What gets into your binary | Library code is copied in | Just a reference; OS loads the shared library at runtime |
| Binary size | Larger | Smaller |
| Startup time | Faster (no DSO load) | Slower (linker resolves at startup) |
| Updates | Need to relink to upgrade | Drop-in replacement of .so |
| Symbol conflicts | Can hide internal symbols | Whole-library symbol table exposed |
| Distribution | Self-contained | Need to ship/install the .so |
Static linking is preferred for command-line tools and small binaries; dynamic linking for OS-shipped libraries (libc, OpenSSL, etc.) and plugin systems.
8. Name Mangling
C++ allows function overloading and namespaces, but the linker only knows by name. To make symbols unique, the compiler mangles them — encodes the function signature into the symbol name.
namespace ns {
int add(int, int);
double add(double, double);
}
// gcc/clang mangled names:
// _ZN2ns3addEii ns::add(int, int)
// _ZN2ns3addEdd ns::add(double, double)
Tools to demangle:
echo "_ZN2ns3addEii" | c++filt # → ns::add(int, int)
nm --demangle obj.o # list demangled symbols
Name mangling is why C++ libraries are not directly callable from C. To expose a function with a stable, un-mangled name, use extern "C":
extern "C" {
void my_api(int); // mangled as: my_api (unchanged)
}
The cost: no overloading, no namespaces, no name collisions allowed. extern "C" is the standard way to provide a C-callable interface to a C++ library.