C++ Strings and Text


  • Description: A note on C-style strings (char*), std::string, std::string_view (C++17), string conversions, and modern formatting (std::format, std::print)
  • My Notion Note ID: K2A-B1-4
  • Created: 2020-04-10
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. C-Style Strings (char*)

  • C-style string = contiguous chars terminated by '\0'. Null terminator distinguishes from generic char* — tells C string fns where it ends.
const char* s = "hello";  // 6 bytes: h e l l o \0
char buf[10] = "hi";      // h i \0 \0 \0 \0 \0 \0 \0 \0  (zero-padded)

#include <cstring>
strlen(s);        // 5  (does NOT include the null terminator)
strcmp(s, "hi");  // negative
strcpy(buf, "world");  // unsafe — no bounds check
  • char* = pointer to one or more chars. Whether it's "a string" depends on null termination:
  1. nullptr — no string. Just a null pointer.
  2. char* pointing to bytes with '\0' inside — C string. Length = position of '\0'.
  3. char* pointing to bytes with no '\0'not a string. strlen on it = UB.

Issues with C-style strings

  1. No size info — every op scans to null. strlen is O(n) every time.
  2. No bounds checkingstrcpy, strcat, gets infamous for buffer overflows.
  3. No ownership — who owns the buffer? Manual tracking.
  4. String literals read-onlychar* p = "hello"; p[0] = 'H'; is UB. Use const char*.
  5. Dynamic strings need manual malloc/free.
  • Still essential for C API interop. Use std::string::c_str() to get null-terminated const char* from std::string.

2. std::string

  • C++ way to handle text. Owns memory, knows size, grows dynamically.
#include <string>

std::string s = "hello";
s += " world";              // concatenation
s.size();                   // 11
s.length();                 // 11 (alias for size)
s.empty();                  // false

// Substrings, find, replace
s.substr(6, 5);             // "world"
s.find("world");            // 6 (or std::string::npos if missing)
s.replace(6, 5, "C++");     // "hello C++"

// Iteration (it is a range)
for (char c : s) std::cout << c;

// Conversion to/from char*
const char* cstr = s.c_str();   // null-terminated, valid until s changes
std::string s2 = cstr;          // construct from C string

// Comparison (works as expected)
if (s == "hello C++") { /* ... */ }

Small String Optimization (SSO)

  • Most impls store short strings (~15–23 chars) inline within the string object, no heap allocation.
  • Passing small string by value is cheap.
  • For very short strings → prefer std::string over char*. SSO makes it nearly as efficient + you get safety, ownership, length tracking.

3. std::string_view (C++17)

  • std::string_view = non-owning view of a string. (const char*, size_t) pair.
  • Modern way to write "takes any kind of string" param without copying.
#include <string_view>

void print(std::string_view sv) {
    std::cout << sv;
}

print("literal");                  // const char* -> string_view (no copy)
print(std::string("dynamic"));     // string -> string_view (no copy)

char buf[] = "buffer";
print(buf);                        // char[] -> string_view

// Substring without copying:
std::string_view sv = "abcdef";
print(sv.substr(1, 3));            // "bcd" — no allocation, just a slice

Lifetime trap

  • string_view doesn't own underlying chars. Storing one beyond source's lifetime = use-after-free:
std::string_view make_view() {
    std::string s = "temporary";
    return s;        // BUG: s is destroyed; returned view points to freed memory
}

Rules of thumb:

  1. string_view for parameters.
  2. Don't store as class member / return from fn unless lifetime obvious + documented.
  3. std::string for owned text storage.

4. String Conversions

#include <string>
#include <charconv>      // C++17 from_chars / to_chars

// Number → string
std::to_string(42);          // "42"
std::to_string(3.14);        // "3.140000" (fixed precision)

// String → number (throws on failure, allocates)
int n = std::stoi("42");
double d = std::stod("3.14");
size_t pos;
int x = std::stoi("42abc", &pos);   // x = 42, pos = 2

// Best for performance: from_chars / to_chars (C++17)
// — locale-independent, no allocation, round-trip correct
const char* str = "42";
int value;
auto [ptr, ec] = std::from_chars(str, str + 2, value);
if (ec == std::errc{}) {
    // value == 42
}

char buf[16];
auto [end, ec2] = std::to_chars(buf, buf + sizeof buf, 42);
*end = '\0';
  • from_chars / to_chars — fastest string ↔ number in stdlib. Use for hot paths + anywhere you'd reach for sprintf/atoi.

5. std::format and std::print

  • std::format (C++20, <format>) — type-safe, Python-like formatter.
  • std::print (C++23, <print>) — prints directly to a stream.
#include <format>
#include <print>     // C++23

std::string s = std::format("Hello, {}!", name);
std::string t = std::format("{:>10}", 42);            // right-align width 10
std::string u = std::format("{:.3f}", 3.14159);       // "3.142"
std::string v = std::format("{0} and {0}", "twice");  // "twice and twice"

std::print("Hello, {}!\n", name);                     // C++23: write to stdout
std::println("count = {}", n);                        // adds a newline
  • Format specifiers loosely follow Python:
Spec Meaning
{} Default formatting
{:>10} / {:<10} / {:^10} Right / left / center align in width 10
{:0>5} Zero-pad to width 5
{:.3f} 3 decimal places
{:#x} Hex with 0x prefix
{:b} Binary
{0}, {1} Positional arguments
  • Custom types — specialize std::formatter<T>:
struct Point { int x, y; };

template <>
struct std::formatter<Point> : std::formatter<std::string> {
    auto format(Point p, format_context& ctx) const {
        return std::formatter<std::string>::format(
            std::format("({}, {})", p.x, p.y), ctx);
    }
};

std::print("p = {}\n", Point{1, 2});   // "p = (1, 2)"
  • vs printf: type-safe (no format-string mismatches), extensible (custom formatters).
  • vs <iostream>: faster, less verbose, supports positional args.

6. Wide and Unicode Strings

  • Several string types for different encodings. Avoid in modern code unless doing platform-specific work — use std::string (UTF-8).
Type Underlying char Typical use
std::string char (8-bit) UTF-8 (recommended), or platform default
std::wstring wchar_t (16-bit on Windows, 32-bit on Linux) Windows API interop
std::u8string (C++20) char8_t Explicit UTF-8
std::u16string char16_t UTF-16
std::u32string char32_t UTF-32 (each element is one Unicode code point)
  • stdlib historically weak on Unicode-aware text handling (case folding, normalization, segmentation, collation). Use ICU or dedicated lib for real Unicode ops.