C++ Compilation Model and Linkage


  • Description: A note on how C++ programs are built — the compilation pipeline, translation units, the One Definition Rule, linkage, headers, the preprocessor, and static vs dynamic linking
  • My Notion Note ID: K2A-B1-2
  • Created: 2018-09-15
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. The Compilation Pipeline

A C++ source file goes through four phases on its way to an executable:

  1. Preprocessing#include, #define, conditional compilation. Output: a single text "translation unit" with macros expanded and headers inlined.
  2. Compilation — translation unit → object file (.o / .obj). Each TU is compiled independently; the compiler does not see other TUs.
  3. Assembly — assembly text → machine code (usually integrated with the compilation step).
  4. Linking — multiple object files + libraries → a single executable or shared library. Resolves cross-TU symbol references.
foo.cpp ──[preprocess]──► foo.i ──[compile]──► foo.o ──┐
bar.cpp ──[preprocess]──► bar.i ──[compile]──► bar.o ──┼─[link]─► program
                                            stdlib.a ──┘

Each step in the pipeline matters: macro hygiene affects the preprocessor; inline and templates affect compilation; static/extern and library order affect linking.


2. Translation Units

A translation unit (TU) is the output of preprocessing one source file — the source plus all its #included headers, with macros expanded. It's the unit the compiler sees.

Key consequences:

  1. Each TU is compiled independently. The compiler doesn't know what's in other TUs.
  2. Headers included in many TUs are reparsed in each one. This is what makes large C++ builds slow (and what modules fix).
  3. static at namespace scope means "internal to this TU" — invisible to other TUs.
  4. Templates instantiated in a TU live in that TU. That's why template definitions usually go in headers — so every TU that uses the template can instantiate it.

3. The One Definition Rule (ODR)

The ODR is the most important rule in C++ linking:

Every variable, function, class type, enumeration, and template must have exactly one definition across the entire program.

There are nuances:

  1. Declarations are unrestricted. You can declare a function in 100 headers; you can only define it once.
  2. inline functions, inline variables (C++17), class definitions, templates may be defined in multiple TUs, but every definition must be token-for-token identical. The linker picks one.
  3. Anonymous namespaces and static definitions are TU-local, so they are not subject to the cross-TU "exactly one" rule.

Common ODR violations:

// foo.h
int x = 42;                            // BUG: definition in header
inline int safe_x = 42;                // OK: 'inline' allows multi-TU definition

void f() { /* ... */ }                 // BUG: non-inline definition in header
inline void g() { /* ... */ }          // OK

When the linker sees two non-inline definitions of f(), you get a "multiple definition" error. When it sees two different definitions of an inline function (e.g., one TU compiled with a different macro), the program has undefined behavior — the linker doesn't have to detect the divergence.


4. Linkage: Internal vs External

A name has linkage if it can refer to the same entity from a different scope. Three kinds:

Linkage Visibility How to declare
No linkage Block scope only (locals) Default for local variables
Internal One TU only static at namespace scope, anonymous namespace, const namespace-scope variables (without extern)
External All TUs in the program Default for non-static namespace-scope names; extern is implicit; functions are external by default
// translation unit 1
static int counter = 0;          // internal — invisible to TU 2
namespace { void helper(); }     // also internal (preferred modern style)

int g_count = 0;                 // external — TU 2 can extern-declare it
void g_init();                   // external — declarations propagate via extern

The modern idiomatic way to give a name internal linkage is the unnamed namespace, not file-scope static:

namespace {
    int helper_count = 0;
    void helper() { /* ... */ }
}

Unnamed namespaces also work for types — static is a storage-class specifier that only applies to objects and functions, never to types.


5. Header Files and Header Guards

Headers contain declarations (and inline / template / class definitions) shared across TUs. Without protection, including a header twice causes redefinition errors.

Two equivalent ways to guard a header:

// header.h — using #pragma once (non-standard but widely supported)
#pragma once

void foo();

// header.h — using include guards (portable, standard)
#ifndef MYPROJECT_HEADER_H
#define MYPROJECT_HEADER_H

void foo();

#endif  // MYPROJECT_HEADER_H

#pragma once is shorter and less error-prone (no risk of macro name collision). All major compilers support it. Use it unless you're targeting a really exotic toolchain.

What goes in headers

Goes in headers Stays in .cpp
Function declarations Function definitions (non-inline)
Class definitions Implementation details
inline functions Static globals (file-scope state)
Templates Anonymous namespace contents
inline variables (C++17) Mutable globals
constexpr functions and variables
Type aliases (using, typedef)

6. The Preprocessor

The preprocessor runs before the compiler proper and does pure text substitution. It has no concept of types or scope.

#include <header>     // include another file
#include "local.h"

#define MAX 100       // macro: text substitution
#define SQUARE(x) ((x) * (x))   // function-like macro (note the parens!)

#ifdef DEBUG          // conditional compilation
    log("debug");
#elif defined(RELEASE)
    log("release");
#else
    log("unknown");
#endif

#if __cplusplus >= 202002L
    // C++20 and later
#endif

#error "unsupported config"   // compile-time error
#warning "deprecated"          // compile-time warning (standardized in C++23; widely supported as an extension before)
#pragma once

Predefined macros

Macro Meaning
__cplusplus C++ standard version (e.g. 202002L for C++20)
__FILE__, __LINE__ Current file path and line
__func__ Current function name (C99/C++11)
__DATE__, __TIME__ Build timestamp
_WIN32, __linux__, __APPLE__ Platform
__GNUC__, _MSC_VER, __clang__ Compiler

Macros are dangerous

  1. No type safety. SQUARE(x++) evaluates x++ twice.
  2. No scoping. A #define MIN in a header pollutes every TU that includes it.
  3. Hard to debug. The debugger sees post-expansion text, not the macro name.

Modern C++ avoids macros for constants (constexpr), inline functions (inline), and generic functions (templates). Reserve macros for include guards, conditional compilation, and platform abstraction.


7. Static vs Dynamic Linking

When the linker bundles object files into an executable, libraries can be resolved two ways:

Aspect Static linking Dynamic linking
File extension .a (Unix), .lib (Windows) .so (Linux), .dylib (macOS), .dll (Windows)
What gets into your binary Library code is copied in Just a reference; OS loads the shared library at runtime
Binary size Larger Smaller
Startup time Faster (no DSO load) Slower (linker resolves at startup)
Updates Need to relink to upgrade Drop-in replacement of .so
Symbol conflicts Can hide internal symbols Whole-library symbol table exposed
Distribution Self-contained Need to ship/install the .so

Static linking is preferred for command-line tools and small binaries; dynamic linking for OS-shipped libraries (libc, OpenSSL, etc.) and plugin systems.


8. Name Mangling

C++ allows function overloading and namespaces, but the linker only knows by name. To make symbols unique, the compiler mangles them — encodes the function signature into the symbol name.

namespace ns {
    int add(int, int);
    double add(double, double);
}

// gcc/clang mangled names:
//   _ZN2ns3addEii        ns::add(int, int)
//   _ZN2ns3addEdd        ns::add(double, double)

Tools to demangle:

echo "_ZN2ns3addEii" | c++filt   # → ns::add(int, int)
nm --demangle obj.o              # list demangled symbols

Name mangling is why C++ libraries are not directly callable from C. To expose a function with a stable, un-mangled name, use extern "C":

extern "C" {
    void my_api(int);             // mangled as: my_api  (unchanged)
}

The cost: no overloading, no namespaces, no name collisions allowed. extern "C" is the standard way to provide a C-callable interface to a C++ library.