C++ Strings and Text
- Description: A note on C-style strings (
char*),std::string,std::string_view(C++17), string conversions, and modern formatting (std::format,std::print) - My Notion Note ID: K2A-B1-4
- Created: 2020-04-10
- Updated: 2026-02-28
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. C-Style Strings (
char*) - 2.
std::string - 3.
std::string_view(C++17) - 4. String Conversions
- 5.
std::formatandstd::print - 6. Wide and Unicode Strings
1. C-Style Strings (char*)
A C-style string is a contiguous sequence of chars terminated by a null byte ('\0'). The null terminator is what distinguishes it from a generic char* — it tells every C string function where the string ends.
const char* s = "hello"; // 6 bytes: h e l l o \0
char buf[10] = "hi"; // h i \0 \0 \0 \0 \0 \0 \0 \0 (zero-padded)
#include <cstring>
strlen(s); // 5 (does NOT include the null terminator)
strcmp(s, "hi"); // negative
strcpy(buf, "world"); // unsafe — no bounds check
A char* is just a pointer to one or more chars. Whether it represents "a string" depends entirely on whether the bytes are null-terminated:
nullptr— no string at all, just a null pointer.char*pointing to a buffer of bytes with'\0'somewhere inside — a C string. Length is the position of the'\0'.char*pointing to bytes with no'\0'— not a string; passing it tostrlenis undefined behavior.
Issues with C-style strings
- No size info. Every operation has to scan to the null.
strlenis O(n), every time. - No bounds checking.
strcpy,strcat,getsare infamous buffer-overflow sources. - No ownership semantics. Who owns the buffer? Tracked manually.
- String literals are read-only.
char* p = "hello"; p[0] = 'H';is undefined behavior. Useconst char*for literals. - Heap allocation requires manual
malloc/freefor dynamic strings.
C-style strings are still essential when interoperating with C APIs. Use std::string::c_str() to get a null-terminated const char* from a std::string.
2. std::string
std::string is the C++ way to handle text. It owns its memory, knows its size, and grows dynamically.
#include <string>
std::string s = "hello";
s += " world"; // concatenation
s.size(); // 11
s.length(); // 11 (alias for size)
s.empty(); // false
// Substrings, find, replace
s.substr(6, 5); // "world"
s.find("world"); // 6 (or std::string::npos if missing)
s.replace(6, 5, "C++"); // "hello C++"
// Iteration (it is a range)
for (char c : s) std::cout << c;
// Conversion to/from char*
const char* cstr = s.c_str(); // null-terminated, valid until s changes
std::string s2 = cstr; // construct from C string
// Comparison (works as expected)
if (s == "hello C++") { /* ... */ }
Small String Optimization (SSO)
Most implementations of std::string store short strings (typically up to 15–23 chars) inline within the string object itself, avoiding a heap allocation. Passing a small string by value is therefore cheap.
For very short strings, prefer std::string over char* — SSO makes it nearly as efficient and you get safety, ownership, and length tracking for free.
3. std::string_view (C++17)
std::string_view is a non-owning view of a string — a (const char*, size_t) pair. It is the modern way to write a "takes any kind of string" parameter without copying.
#include <string_view>
void print(std::string_view sv) {
std::cout << sv;
}
print("literal"); // const char* -> string_view (no copy)
print(std::string("dynamic")); // string -> string_view (no copy)
char buf[] = "buffer";
print(buf); // char[] -> string_view
// Substring without copying:
std::string_view sv = "abcdef";
print(sv.substr(1, 3)); // "bcd" — no allocation, just a slice
Lifetime trap
A string_view does not own the underlying characters. Storing one beyond the lifetime of the source is a use-after-free bug:
std::string_view make_view() {
std::string s = "temporary";
return s; // BUG: s is destroyed; returned view points to freed memory
}
Rules of thumb:
- Use
string_viewfor parameters. - Don't store
string_viewas a class member or return it from a function unless the lifetime is obvious and documented. - Use
std::stringfor owned text storage.
4. String Conversions
#include <string>
#include <charconv> // C++17 from_chars / to_chars
// Number → string
std::to_string(42); // "42"
std::to_string(3.14); // "3.140000" (fixed precision)
// String → number (throws on failure, allocates)
int n = std::stoi("42");
double d = std::stod("3.14");
size_t pos;
int x = std::stoi("42abc", &pos); // x = 42, pos = 2
// Best for performance: from_chars / to_chars (C++17)
// — locale-independent, no allocation, round-trip correct
const char* str = "42";
int value;
auto [ptr, ec] = std::from_chars(str, str + 2, value);
if (ec == std::errc{}) {
// value == 42
}
char buf[16];
auto [end, ec2] = std::to_chars(buf, buf + sizeof buf, 42);
*end = '\0';
from_chars and to_chars are the fastest string ↔ number conversions in the standard library. Use them for hot paths and any time you'd otherwise reach for sprintf/atoi.
5. std::format and std::print
std::format (C++20, <format>) is a type-safe, Python-like string formatter. std::print (C++23, <print>) prints directly to a stream.
#include <format>
#include <print> // C++23
std::string s = std::format("Hello, {}!", name);
std::string t = std::format("{:>10}", 42); // right-align width 10
std::string u = std::format("{:.3f}", 3.14159); // "3.142"
std::string v = std::format("{0} and {0}", "twice"); // "twice and twice"
std::print("Hello, {}!\n", name); // C++23: write to stdout
std::println("count = {}", n); // adds a newline
Format specifiers loosely follow Python:
| Spec | Meaning |
|---|---|
{} |
Default formatting |
{:>10} / {:<10} / {:^10} |
Right / left / center align in width 10 |
{:0>5} |
Zero-pad to width 5 |
{:.3f} |
3 decimal places |
{:#x} |
Hex with 0x prefix |
{:b} |
Binary |
{0}, {1} |
Positional arguments |
For custom types, specialize std::formatter<T>:
struct Point { int x, y; };
template <>
struct std::formatter<Point> : std::formatter<std::string> {
auto format(Point p, format_context& ctx) const {
return std::formatter<std::string>::format(
std::format("({}, {})", p.x, p.y), ctx);
}
};
std::print("p = {}\n", Point{1, 2}); // "p = (1, 2)"
Compared to printf, std::format is type-safe (no format-string mismatches) and extensible (custom formatters). Compared to <iostream>, it's faster, less verbose, and supports positional arguments.
6. Wide and Unicode Strings
C++ has several string types for different encodings. Avoid them in modern code unless you're doing platform-specific work — use std::string (UTF-8) wherever possible.
| Type | Underlying char | Typical use |
|---|---|---|
std::string |
char (8-bit) |
UTF-8 (recommended), or platform default |
std::wstring |
wchar_t (16-bit on Windows, 32-bit on Linux) |
Windows API interop |
std::u8string (C++20) |
char8_t |
Explicit UTF-8 |
std::u16string |
char16_t |
UTF-16 |
std::u32string |
char32_t |
UTF-32 (each element is one Unicode code point) |
The C++ standard library has historically been weak on Unicode-aware text handling (case folding, normalization, segmentation, collation). Use ICU or a dedicated library when you need real Unicode operations.