C++ Compilation and Linkage


  • Description: A note on how C++ programs are built — the compilation pipeline, translation units, the One Definition Rule, linkage, headers, the preprocessor, and static vs dynamic linking
  • My Notion Note ID: K2A-B1-2
  • Created: 2018-09-15
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. The Compilation Pipeline

  • 4 phases from .cpp to executable:
  1. Preprocessing#include, #define, conditional compilation. Output: single text translation unit (macros expanded, headers inlined).
  2. Compilation — TU → object file (.o / .obj). Each TU compiled independently; compiler can't see other TUs.
  3. Assembly — asm text → machine code (usually integrated with compile).
  4. Linking — object files + libraries → executable / shared lib. Resolves cross-TU symbol references.
foo.cpp ──[preprocess]──► foo.i ──[compile]──► foo.o ──┐
bar.cpp ──[preprocess]──► bar.i ──[compile]──► bar.o ──┼─[link]─► program
                                            stdlib.a ──┘
  • Each step matters — macro hygiene (preprocessor), inline/templates (compile), static/extern/lib order (link).

2. Translation Units

  • TU = preprocessed source — source + all #included headers, macros expanded. The unit the compiler sees.

Key consequences:

  1. Each TU compiled independently. Compiler doesn't know what's in other TUs.
  2. Headers #included in many TUs are reparsed each time → slow C++ builds (modules fix this).
  3. static at namespace scope = internal to this TU. Invisible to other TUs.
  4. Templates instantiated in a TU live in that TU → template definitions go in headers.

3. The One Definition Rule (ODR)

  • Most important linking rule:

Every variable, function, class type, enumeration, and template must have exactly one definition across the entire program.

Nuances:

  1. Declarations unrestricted — declare in 100 headers, define once.
  2. inline functions/variables (C++17), classes, templates — multi-TU definitions OK, but every definition must be token-for-token identical. Linker picks one.
  3. Anonymous namespaces and static — TU-local. Not subject to cross-TU "exactly one" rule.

Common violations:

// foo.h
int x = 42;                            // BUG: definition in header
inline int safe_x = 42;                // OK: 'inline' allows multi-TU definition

void f() { /* ... */ }                 // BUG: non-inline definition in header
inline void g() { /* ... */ }          // OK
  • 2 non-inline defs of f() → linker "multiple definition" error.
  • 2 different inline definitions (e.g., compiled with different macros) → UB; linker won't detect.

4. Linkage: Internal vs External

  • Linkage = whether a name refers to the same entity from a different scope. 3 kinds:
Linkage Visibility How to declare
No linkage Block scope only (locals) Default for local variables
Internal One TU only static at namespace scope, anonymous namespace, const namespace-scope variables (without extern)
External All TUs in the program Default for non-static namespace-scope names; extern is implicit; functions are external by default
// translation unit 1
static int counter = 0;          // internal — invisible to TU 2
namespace { void helper(); }     // also internal (preferred modern style)

int g_count = 0;                 // external — TU 2 can extern-declare it
void g_init();                   // external — declarations propagate via extern
  • Modern idiom for internal linkage: unnamed namespace, not file-scope static:
namespace {
    int helper_count = 0;
    void helper() { /* ... */ }
}
  • Unnamed namespaces also work for types — static is a storage-class specifier (objects/functions only, not types).

5. Header Files and Header Guards

  • Headers hold declarations + inline/template/class definitions shared across TUs.
  • Without guards → double-include causes redefinition errors.

Two equivalent guards:

// header.h — using #pragma once (non-standard but widely supported)
#pragma once

void foo();

// header.h — using include guards (portable, standard)
#ifndef MYPROJECT_HEADER_H
#define MYPROJECT_HEADER_H

void foo();

#endif  // MYPROJECT_HEADER_H
  • #pragma once — shorter, no macro collision risk. All major compilers support it. Use unless targeting exotic toolchain.

What goes in headers

Goes in headers Stays in .cpp
Function declarations Function definitions (non-inline)
Class definitions Implementation details
inline functions Static globals (file-scope state)
Templates Anonymous namespace contents
inline variables (C++17) Mutable globals
constexpr functions and variables
Type aliases (using, typedef)

6. The Preprocessor

  • Runs before the compiler. Pure text substitution. No concept of types/scope.
#include <header>     // include another file
#include "local.h"

#define MAX 100       // macro: text substitution
#define SQUARE(x) ((x) * (x))   // function-like macro (note the parens!)

#ifdef DEBUG          // conditional compilation
    log("debug");
#elif defined(RELEASE)
    log("release");
#else
    log("unknown");
#endif

#if __cplusplus >= 202002L
    // C++20 and later
#endif

#error "unsupported config"   // compile-time error
#warning "deprecated"          // compile-time warning (standardized in C++23; widely supported as an extension before)
#pragma once

Predefined macros

Macro Meaning
__cplusplus C++ standard version (e.g. 202002L for C++20)
__FILE__, __LINE__ Current file path and line
__func__ Current function name (C99/C++11)
__DATE__, __TIME__ Build timestamp
_WIN32, __linux__, __APPLE__ Platform
__GNUC__, _MSC_VER, __clang__ Compiler

Macros are dangerous

  1. No type safetySQUARE(x++) evaluates x++ twice.
  2. No scoping#define MIN in a header pollutes every TU including it.
  3. Hard to debug — debugger sees post-expansion text, not the macro name.
  • Modern C++ replaces macros with constexpr (constants), inline (functions), templates (generics). Reserve macros for include guards, conditional compilation, platform abstraction.

7. Static vs Dynamic Linking

Aspect Static linking Dynamic linking
File extension .a (Unix), .lib (Windows) .so (Linux), .dylib (macOS), .dll (Windows)
What gets into your binary Library code is copied in Just a reference; OS loads the shared library at runtime
Binary size Larger Smaller
Startup time Faster (no DSO load) Slower (linker resolves at startup)
Updates Need to relink to upgrade Drop-in replacement of .so
Symbol conflicts Can hide internal symbols Whole-library symbol table exposed
Distribution Self-contained Need to ship/install the .so
  • Static — CLI tools, small binaries.
  • Dynamic — OS-shipped libs (libc, OpenSSL), plugin systems.

8. Name Mangling

  • Linker knows only by name → compiler mangles to make overloads/namespaces unique. Encodes signature into symbol name.
namespace ns {
    int add(int, int);
    double add(double, double);
}

// gcc/clang mangled names:
//   _ZN2ns3addEii        ns::add(int, int)
//   _ZN2ns3addEdd        ns::add(double, double)

Demangling:

echo "_ZN2ns3addEii" | c++filt   # → ns::add(int, int)
nm --demangle obj.o              # list demangled symbols
  • Mangling = why C++ libs aren't directly C-callable.
  • extern "C" → stable un-mangled name:
extern "C" {
    void my_api(int);             // mangled as: my_api  (unchanged)
}
  • Cost: no overloading, no namespaces, no name collisions. Standard way to provide a C-callable interface to a C++ library.