C++ String Streams and Regex
- Description: A note on
std::stringstream,std::ostringstream,std::istringstreamfor in-memory I/O, and the<regex>library for pattern matching - My Notion Note ID: K2A-B1-21
- Created: 2018-12-30
- Updated: 2026-02-28
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. String Streams Overview
- 2.
std::ostringstream— Building Strings - 3.
std::istringstream— Parsing Strings - 4.
std::stringstream— Bidirectional - 5.
<regex>Basics - 6. Regex Patterns
- 7. Regex Performance and When to Avoid
1. String Streams Overview
The <sstream> header provides three stream types backed by std::string:
| Type | Direction | Use for |
|---|---|---|
std::ostringstream |
Output (write) | Building a string from heterogeneous values |
std::istringstream |
Input (read) | Parsing a string into typed values |
std::stringstream |
Both | Bidirectional in-memory buffer |
They have the same << / >> interface as std::cout / std::cin, plus .str() to access the underlying string.
2. std::ostringstream — Building Strings
#include <sstream>
#include <iomanip>
#include <string>
std::ostringstream oss;
oss << "x=" << 42 << ", y=" << 3.14;
std::string s = oss.str(); // "x=42, y=3.14"
// With manipulators
std::ostringstream oss2;
oss2 << std::hex << std::uppercase << 255 << " "
<< std::fixed << std::setprecision(2) << 3.14159;
// "FF 3.14"
// Reset for reuse
oss.str(""); // clear contents
oss.clear(); // clear error flags
oss << "fresh";
In modern C++, std::format (C++20, see K2A-B1-4 § 5) is usually a cleaner choice for building strings:
auto s = std::format("x={}, y={}", 42, 3.14); // shorter, type-safe, faster
ostringstream remains useful when:
- The composition is conditional (write some pieces only if a condition holds).
- You need fine-grained stream-state control (locale, manipulators).
- Pre-C++20 codebases.
3. std::istringstream — Parsing Strings
#include <sstream>
#include <string>
std::istringstream iss{"42 3.14 hello"};
int n;
double d;
std::string word;
iss >> n >> d >> word; // n=42, d=3.14, word="hello"
// Read all words
std::istringstream lines{"alpha beta gamma"};
std::string token;
while (lines >> token) {
std::cout << token << "\n";
}
// Line-by-line parsing
std::istringstream multi{"line one\nline two\n"};
std::string line;
while (std::getline(multi, line)) {
// process line
}
// Detect parse failure
std::istringstream bad{"not a number"};
int v;
if (!(bad >> v)) {
std::cerr << "parse failed\n";
}
For high-performance number parsing, std::from_chars (C++17) is faster and locale-independent (see K2A-B1-4 § 4). istringstream is more flexible but heavier.
4. std::stringstream — Bidirectional
#include <sstream>
std::stringstream ss;
ss << 42 << " " << 3.14; // write
int n;
double d;
ss >> n >> d; // read
ss.str(); // current contents
stringstream is rarely the right choice — bidirectional buffering is awkward, and the read/write positions interact in subtle ways. Pick ostringstream or istringstream for clarity.
5. <regex> Basics
The <regex> library (C++11) provides pattern matching with an ECMAScript-like dialect by default.
#include <regex>
#include <string>
#include <iostream>
std::string s = "[email protected]";
std::regex pattern{R"((\w+)@(\w+\.\w+))"}; // raw string for backslashes
// Test if any match exists
if (std::regex_search(s, pattern)) {
std::cout << "found\n";
}
// Extract submatches
std::smatch m;
if (std::regex_search(s, m, pattern)) {
std::cout << "full: " << m[0] << "\n"; // [email protected]
std::cout << "user: " << m[1] << "\n"; // user42
std::cout << "domain: " << m[2] << "\n"; // example.com
}
// Whole-string match (not just contains)
std::regex_match(s, m, pattern);
// Replace
std::string masked = std::regex_replace(s, pattern, "[REDACTED]");
// "[REDACTED]"
// Iterate all matches
auto begin = std::sregex_iterator{s.begin(), s.end(), pattern};
auto end = std::sregex_iterator{};
for (auto it = begin; it != end; ++it) {
std::cout << it->str() << "\n";
}
Functions
| Function | Purpose |
|---|---|
std::regex_search |
Find first match anywhere in the string |
std::regex_match |
Match the entire string |
std::regex_replace |
Substitute matches with a replacement |
std::sregex_iterator |
Iterate all matches |
std::sregex_token_iterator |
Tokenize (split by pattern or capture) |
Match types
| Type | Holds |
|---|---|
std::smatch |
Match results over a std::string |
std::cmatch |
Match results over a C-string |
std::wsmatch / std::wcmatch |
Wide-string variants |
6. Regex Patterns
The default ECMAScript dialect supports the usual constructs:
| Pattern | Matches |
|---|---|
. |
Any character (except newline by default) |
\d \D |
Digit / non-digit |
\w \W |
Word char ([A-Za-z0-9_]) / non-word |
\s \S |
Whitespace / non-whitespace |
[abc] |
Any of a, b, c |
[^abc] |
Anything except a, b, c |
[a-z] |
Range |
* + ? |
0+, 1+, 0-or-1 of preceding |
{n} {n,} {n,m} |
Exactly n, n+, n to m |
*? +? ?? |
Lazy (non-greedy) variants |
^ $ |
Start / end of string (or line in multiline mode) |
\b |
Word boundary |
(...) |
Capture group |
(?:...) |
Non-capturing group |
| |
Alternation |
\1 \2 |
Backreference to capture group N |
Use raw string literals
Always wrap patterns in R"(...)" so backslashes don't need to be doubled:
std::regex bad{"(\\d+)\\.(\\d+)"}; // hard to read
std::regex good{R"((\d+)\.(\d+))"}; // much better
Other dialects
std::regex p1{"a.b", std::regex::extended}; // POSIX extended
std::regex p2{"a.b", std::regex::basic}; // POSIX basic
std::regex p3{"a.b", std::regex::ECMAScript}; // default
std::regex p4{"a.b", std::regex::icase}; // case-insensitive
7. Regex Performance and When to Avoid
Standard <regex> has a reputation for being slow. Major implementations compile patterns into NFA-based matchers, which are correct but several times slower than re2 or PCRE2.
Don't use regex when:
- The pattern is fixed and simple. A raw
find/starts_with/ends_withis much faster. - You're scanning a large file. Use a faster engine —
re2(Google),boost::regex, orctre(compile-time-compiled regex). - Performance is critical. A handwritten state machine or
std::rangesfilter often beats regex.
Do use regex when:
- The pattern is genuinely complex (alternations, groups, anchors).
- The pattern needs to be configurable at runtime (read from config / user input).
- You need a quick prototype and the throughput isn't a concern.
Common gotcha: pattern construction cost
Compiling a std::regex is expensive. Cache it; don't construct it inside a loop.
// BAD: recompiles regex on every call
bool is_email(const std::string& s) {
return std::regex_match(s, std::regex{R"(\w+@\w+\.\w+)"});
}
// GOOD: compile once
bool is_email(const std::string& s) {
static const std::regex pattern{R"(\w+@\w+\.\w+)"};
return std::regex_match(s, pattern);
}