- Description: A note on
std::stringstream, std::ostringstream, std::istringstream for in-memory I/O, and the <regex> library for pattern matching
- My Notion Note ID: K2A-B1-21
- Created: 2018-12-30
- Updated: 2026-02-28
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
1. String Streams Overview
<sstream> — 3 stream types backed by std::string:
| Type |
Direction |
Use for |
std::ostringstream |
Output (write) |
Building a string from heterogeneous values |
std::istringstream |
Input (read) |
Parsing a string into typed values |
std::stringstream |
Both |
Bidirectional in-memory buffer |
- Same
<< / >> interface as cout / cin, plus .str() for underlying string.
2. std::ostringstream — Building Strings
#include <sstream>
#include <iomanip>
#include <string>
std::ostringstream oss;
oss << "x=" << 42 << ", y=" << 3.14;
std::string s = oss.str();
std::ostringstream oss2;
oss2 << std::hex << std::uppercase << 255 << " "
<< std::fixed << std::setprecision(2) << 3.14159;
oss.str("");
oss.clear();
oss << "fresh";
- Modern C++ —
std::format (C++20, see K2A-B1-4 § 5) usually cleaner:
auto s = std::format("x={}, y={}", 42, 3.14);
ostringstream still useful when:
- Conditional composition (write pieces only if condition holds).
- Fine-grained stream-state control (locale, manipulators).
- Pre-C++20 codebases.
3. std::istringstream — Parsing Strings
#include <sstream>
#include <string>
std::istringstream iss{"42 3.14 hello"};
int n;
double d;
std::string word;
iss >> n >> d >> word;
std::istringstream lines{"alpha beta gamma"};
std::string token;
while (lines >> token) {
std::cout << token << "\n";
}
std::istringstream multi{"line one\nline two\n"};
std::string line;
while (std::getline(multi, line)) {
}
std::istringstream bad{"not a number"};
int v;
if (!(bad >> v)) {
std::cerr << "parse failed\n";
}
- For high-perf number parsing →
std::from_chars (C++17): faster + locale-independent. See K2A-B1-4 § 4. istringstream more flexible but heavier.
4. std::stringstream — Bidirectional
#include <sstream>
std::stringstream ss;
ss << 42 << " " << 3.14;
int n;
double d;
ss >> n >> d;
ss.str();
- Rarely the right choice. Bidirectional buffering is awkward; read/write positions interact subtly. Pick
ostringstream or istringstream.
5. <regex> Basics
<regex> (C++11) — pattern matching with ECMAScript-like dialect by default.
#include <regex>
#include <string>
#include <iostream>
std::string s = "[email protected]";
std::regex pattern{R"((\w+)@(\w+\.\w+))"};
if (std::regex_search(s, pattern)) {
std::cout << "found\n";
}
std::smatch m;
if (std::regex_search(s, m, pattern)) {
std::cout << "full: " << m[0] << "\n";
std::cout << "user: " << m[1] << "\n";
std::cout << "domain: " << m[2] << "\n";
}
std::regex_match(s, m, pattern);
std::string masked = std::regex_replace(s, pattern, "[REDACTED]");
auto begin = std::sregex_iterator{s.begin(), s.end(), pattern};
auto end = std::sregex_iterator{};
for (auto it = begin; it != end; ++it) {
std::cout << it->str() << "\n";
}
Functions
| Function |
Purpose |
std::regex_search |
Find first match anywhere in the string |
std::regex_match |
Match the entire string |
std::regex_replace |
Substitute matches with a replacement |
std::sregex_iterator |
Iterate all matches |
std::sregex_token_iterator |
Tokenize (split by pattern or capture) |
Match types
| Type |
Holds |
std::smatch |
Match results over a std::string |
std::cmatch |
Match results over a C-string |
std::wsmatch / std::wcmatch |
Wide-string variants |
6. Regex Patterns
- Default ECMAScript dialect — usual constructs:
| Pattern |
Matches |
. |
Any character (except newline by default) |
\d \D |
Digit / non-digit |
\w \W |
Word char ([A-Za-z0-9_]) / non-word |
\s \S |
Whitespace / non-whitespace |
[abc] |
Any of a, b, c |
[^abc] |
Anything except a, b, c |
[a-z] |
Range |
* + ? |
0+, 1+, 0-or-1 of preceding |
{n} {n,} {n,m} |
Exactly n, n+, n to m |
*? +? ?? |
Lazy (non-greedy) variants |
^ $ |
Start / end of string (or line in multiline mode) |
\b |
Word boundary |
(...) |
Capture group |
(?:...) |
Non-capturing group |
| |
Alternation |
\1 \2 |
Backreference to capture group N |
Use raw string literals
- Always wrap in
R"(...)" → no doubled backslashes:
std::regex bad{"(\\d+)\\.(\\d+)"};
std::regex good{R"((\d+)\.(\d+))"};
Other dialects
std::regex p1{"a.b", std::regex::extended};
std::regex p2{"a.b", std::regex::basic};
std::regex p3{"a.b", std::regex::ECMAScript};
std::regex p4{"a.b", std::regex::icase};
- Standard
<regex> is slow. Major impls compile to NFA-based matchers — correct but several × slower than re2 or PCRE2.
Don't use regex when:
- Pattern fixed + simple — raw
find / starts_with / ends_with much faster.
- Scanning large files — use faster engine (
re2, boost::regex, ctre).
- Performance critical — hand-written state machine or
std::ranges filter often beats regex.
Do use regex when:
- Pattern genuinely complex (alternations, groups, anchors).
- Pattern configurable at runtime (config / user input).
- Quick prototype, throughput not a concern.
Common gotcha: pattern construction cost
- Compiling
std::regex is expensive. Cache; don't construct in a loop.
bool is_email(const std::string& s) {
return std::regex_match(s, std::regex{R"(\w+@\w+\.\w+)"});
}
bool is_email(const std::string& s) {
static const std::regex pattern{R"(\w+@\w+\.\w+)"};
return std::regex_match(s, pattern);
}