C++ String Streams and Regex


  • Description: A note on std::stringstream, std::ostringstream, std::istringstream for in-memory I/O, and the <regex> library for pattern matching
  • My Notion Note ID: K2A-B1-21
  • Created: 2018-12-30
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. String Streams Overview

The <sstream> header provides three stream types backed by std::string:

Type Direction Use for
std::ostringstream Output (write) Building a string from heterogeneous values
std::istringstream Input (read) Parsing a string into typed values
std::stringstream Both Bidirectional in-memory buffer

They have the same << / >> interface as std::cout / std::cin, plus .str() to access the underlying string.


2. std::ostringstream — Building Strings

#include <sstream>
#include <iomanip>
#include <string>

std::ostringstream oss;
oss << "x=" << 42 << ", y=" << 3.14;
std::string s = oss.str();           // "x=42, y=3.14"

// With manipulators
std::ostringstream oss2;
oss2 << std::hex << std::uppercase << 255 << " "
     << std::fixed << std::setprecision(2) << 3.14159;
// "FF 3.14"

// Reset for reuse
oss.str("");                         // clear contents
oss.clear();                         // clear error flags
oss << "fresh";

In modern C++, std::format (C++20, see K2A-B1-4 § 5) is usually a cleaner choice for building strings:

auto s = std::format("x={}, y={}", 42, 3.14);   // shorter, type-safe, faster

ostringstream remains useful when:

  1. The composition is conditional (write some pieces only if a condition holds).
  2. You need fine-grained stream-state control (locale, manipulators).
  3. Pre-C++20 codebases.

3. std::istringstream — Parsing Strings

#include <sstream>
#include <string>

std::istringstream iss{"42 3.14 hello"};

int    n;
double d;
std::string word;

iss >> n >> d >> word;       // n=42, d=3.14, word="hello"

// Read all words
std::istringstream lines{"alpha beta gamma"};
std::string token;
while (lines >> token) {
    std::cout << token << "\n";
}

// Line-by-line parsing
std::istringstream multi{"line one\nline two\n"};
std::string line;
while (std::getline(multi, line)) {
    // process line
}

// Detect parse failure
std::istringstream bad{"not a number"};
int v;
if (!(bad >> v)) {
    std::cerr << "parse failed\n";
}

For high-performance number parsing, std::from_chars (C++17) is faster and locale-independent (see K2A-B1-4 § 4). istringstream is more flexible but heavier.


4. std::stringstream — Bidirectional

#include <sstream>

std::stringstream ss;

ss << 42 << " " << 3.14;        // write

int    n;
double d;
ss >> n >> d;                    // read

ss.str();                        // current contents

stringstream is rarely the right choice — bidirectional buffering is awkward, and the read/write positions interact in subtle ways. Pick ostringstream or istringstream for clarity.


5. <regex> Basics

The <regex> library (C++11) provides pattern matching with an ECMAScript-like dialect by default.

#include <regex>
#include <string>
#include <iostream>

std::string s = "[email protected]";
std::regex pattern{R"((\w+)@(\w+\.\w+))"};   // raw string for backslashes

// Test if any match exists
if (std::regex_search(s, pattern)) {
    std::cout << "found\n";
}

// Extract submatches
std::smatch m;
if (std::regex_search(s, m, pattern)) {
    std::cout << "full: "   << m[0] << "\n";   // [email protected]
    std::cout << "user: "   << m[1] << "\n";   // user42
    std::cout << "domain: " << m[2] << "\n";   // example.com
}

// Whole-string match (not just contains)
std::regex_match(s, m, pattern);

// Replace
std::string masked = std::regex_replace(s, pattern, "[REDACTED]");
// "[REDACTED]"

// Iterate all matches
auto begin = std::sregex_iterator{s.begin(), s.end(), pattern};
auto end   = std::sregex_iterator{};
for (auto it = begin; it != end; ++it) {
    std::cout << it->str() << "\n";
}

Functions

Function Purpose
std::regex_search Find first match anywhere in the string
std::regex_match Match the entire string
std::regex_replace Substitute matches with a replacement
std::sregex_iterator Iterate all matches
std::sregex_token_iterator Tokenize (split by pattern or capture)

Match types

Type Holds
std::smatch Match results over a std::string
std::cmatch Match results over a C-string
std::wsmatch / std::wcmatch Wide-string variants

6. Regex Patterns

The default ECMAScript dialect supports the usual constructs:

Pattern Matches
. Any character (except newline by default)
\d \D Digit / non-digit
\w \W Word char ([A-Za-z0-9_]) / non-word
\s \S Whitespace / non-whitespace
[abc] Any of a, b, c
[^abc] Anything except a, b, c
[a-z] Range
* + ? 0+, 1+, 0-or-1 of preceding
{n} {n,} {n,m} Exactly n, n+, n to m
*? +? ?? Lazy (non-greedy) variants
^ $ Start / end of string (or line in multiline mode)
\b Word boundary
(...) Capture group
(?:...) Non-capturing group
| Alternation
\1 \2 Backreference to capture group N

Use raw string literals

Always wrap patterns in R"(...)" so backslashes don't need to be doubled:

std::regex bad{"(\\d+)\\.(\\d+)"};    // hard to read
std::regex good{R"((\d+)\.(\d+))"};   // much better

Other dialects

std::regex p1{"a.b", std::regex::extended};       // POSIX extended
std::regex p2{"a.b", std::regex::basic};          // POSIX basic
std::regex p3{"a.b", std::regex::ECMAScript};     // default
std::regex p4{"a.b", std::regex::icase};          // case-insensitive

7. Regex Performance and When to Avoid

Standard <regex> has a reputation for being slow. Major implementations compile patterns into NFA-based matchers, which are correct but several times slower than re2 or PCRE2.

Don't use regex when:

  1. The pattern is fixed and simple. A raw find / starts_with / ends_with is much faster.
  2. You're scanning a large file. Use a faster engine — re2 (Google), boost::regex, or ctre (compile-time-compiled regex).
  3. Performance is critical. A handwritten state machine or std::ranges filter often beats regex.

Do use regex when:

  1. The pattern is genuinely complex (alternations, groups, anchors).
  2. The pattern needs to be configurable at runtime (read from config / user input).
  3. You need a quick prototype and the throughput isn't a concern.

Common gotcha: pattern construction cost

Compiling a std::regex is expensive. Cache it; don't construct it inside a loop.

// BAD: recompiles regex on every call
bool is_email(const std::string& s) {
    return std::regex_match(s, std::regex{R"(\w+@\w+\.\w+)"});
}

// GOOD: compile once
bool is_email(const std::string& s) {
    static const std::regex pattern{R"(\w+@\w+\.\w+)"};
    return std::regex_match(s, pattern);
}