- Description: A comparison of common data structures and language constructs in Java, C++, and Python
- My Notion Note ID: K2A-A1
- Created: 2020-10-01
- Updated: 2026-05-06
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
1. Basic Data Structures
1.1 Basic Data Types
| Concept |
Java |
C++ |
Python |
| Integer types |
byte (8), short (16), int (32), long (64) — all signed, fixed sizes |
char, short, int, long, long long — sizes are minimums (e.g. int ≥ 16 bits, typically 32). Use <cstdint> (int32_t, int64_t, …) for fixed sizes |
int — single arbitrary-precision type, no fixed-width variants in the language. Use numpy.int32 / int64 etc. for fixed-width |
| Unsigned integers |
None |
All integer types (except bool) have an unsigned form |
None (use numpy.uint* if a fixed-width unsigned type is needed) |
| Floating point |
float (32), double (64) |
float, double, long double (sizes implementation-defined; usually IEEE 754) |
float is always 64-bit IEEE 754. numpy.float32 / float16 for narrower |
| Character |
char is 16-bit (UTF-16 code unit) |
char is 1 byte; wchar_t, char16_t, char32_t for wider encodings |
No separate char type — a single character is just a length-1 str. bytes for raw 8-bit data |
| Boolean |
boolean (true / false), not implicitly convertible to int |
bool; implicitly convertible to/from int. Casting bool to int: false is 0, true is 1. Casting int to bool: 0 is false, non-zero is true |
bool is a subclass of int: True == 1, False == 0. Arithmetic on bools works |
| Null literal |
null (reference only) |
nullptr (pointer only); legacy NULL is just 0 |
None (singleton; idiomatically tested with is None) |
1.2 Classes
In lower-level C++ code, strings are often represented as C-style char* arrays terminated by the null character '\0' (the "null terminator"). This is distinct from nullptr, which represents a null pointer (a char* that points to no string at all). Higher-level C++ code typically uses std::string, which manages length and storage internally. Java String and Python str are immutable Unicode sequences and do not expose a null terminator at the language level.
| Language |
Substring |
| Java |
str.substring(start, end); |
| C++ |
str.substr(start, length); |
| Python |
str[start:end] (slice); str[start:end:step] for stride |
1.3 Enum
| Concept |
Java |
C++ |
Python |
| Declaration |
enum Name { A, B } — members are full objects with methods and fields |
enum class Name { A, B }; (scoped, C++11+); legacy unscoped enum Name { A, B }; |
from enum import Enum
class Name(Enum): A = 1; B = 2 |
| Inheritance |
Cannot extends another class (implicitly extends java.lang.Enum); can implements interfaces |
Cannot derive from another type; underlying integer type can be set with : int, : uint8_t, etc. |
An Enum cannot be subclassed once it has members. Mix-ins like IntEnum, StrEnum (3.11+), IntFlag integrate enums with built-in types |
1.4 Inheritance
| Concept |
Java |
C++ |
Python |
| Class inheritance |
Single class inheritance: class Derived extends Base |
Multiple inheritance: class Derived : public Base1, public Base2 { ... } |
Multiple inheritance: class Derived(Base1, Base2): .... Method resolution by C3 linearization (Derived.__mro__) |
| Interface |
class C implements I1, I2; classes can implement many interfaces |
No interface keyword — use abstract classes (all methods pure virtual) |
No interface keyword. Use abc.ABC with @abstractmethod for nominal interfaces, or typing.Protocol for structural ("duck") typing (PEP 544) |
| Virtual dispatch |
Methods are virtual by default (unless final, static, or private) |
Methods are non-virtual by default; opt in with virtual. Use override on the derived method |
All methods are dynamically dispatched (no virtual keyword) |
| Abstract method |
abstract void m(); in an abstract class |
Pure virtual: virtual void m() = 0; |
@abstractmethod decorator inside a class derived from abc.ABC |
| Calling base method |
super.m(); |
Base::m(); |
super().m() (Python 3) — follows the MRO |
| Constructor chaining |
First statement: super(args); |
Member-initializer list: Derived(...) : Base(args) { ... } |
super().__init__(args) inside __init__ |
| Access modifier on inheritance |
Always public-style (no syntactic choice) |
public, protected, private inheritance changes how base members are exposed in the derived class |
N/A — no inheritance access modifiers. Conventions only: _name (protected by convention), __name (name-mangled to _ClassName__name) |
1.5 Generic Types
| Concept |
Java |
C++ |
Python |
| Generic class |
class C<T1, T2> { T1 a; T2 b; } |
template <typename T1, typename T2> class C { T1 a; T2 b; }; |
from typing import Generic, TypeVar
T = TypeVar('T'); class C(Generic[T]): ... or PEP 695: class C[T]: ... (3.12+) |
| Generic function |
<T> void m(T t); called as obj.<String>m(x) |
template <typename T> void f(T t);; called as f<int>(42) |
def f[T](x: T) -> T: ... (3.12+) or T = TypeVar('T'); def f(x: T) -> T: .... No call-site type-argument syntax |
| Bounded type parameter |
<T extends Base> — upper-bounded |
C++20 concepts: template <std::derived_from<Base> T> or requires clause; pre-C++20 SFINAE / std::enable_if |
T = TypeVar('T', bound=Base) |
| Variance / wildcards |
<? extends Base> (covariant; consumer / read), <? super Base> (contravariant; producer / write) |
N/A — templates are not variant; each instantiation is a distinct, unrelated type |
TypeVar('T', covariant=True) / contravariant=True (PEP 484) |
| Runtime behavior |
Type erasure — type arguments erased at compile time; one bytecode method serves all instantiations |
Monomorphization — the compiler emits a separate copy per instantiation |
Erasure-like — type hints are not enforced at runtime by default. Tools like pydantic / beartype add enforcement opt-in |
| Call-site type-argument position |
<> before method name: obj.<String>m(...) |
<> after function name: m<int>(...) |
No call-site type arguments — types are inferred or annotated only |
2. Containers
2.1 Sequential Containers
Conventions: T is the element type, O an object/reference type (Java), K/V for keys/values.
2.1.1 Dynamic array (random access)
| Operation |
Java ArrayList<O> |
C++ std::vector<T> |
Python list |
| Declare / Initialize |
List<O> l = new ArrayList<>();
List<O> l = new ArrayList<>(List.of(a, b)); |
vector<T> v;
vector<T> v{a, b, c};
vector<T> v(n, init); |
l = []
l = [a, b, c]
l = list(iterable) |
| Add |
l.add(obj); l.add(i, obj); |
v.push_back(t); v.emplace_back(...); v.insert(it, t); |
l.append(x); l.insert(i, x); l += [x, y] |
| Remove |
l.remove(obj); (first match)
l.remove(i); |
v.pop_back(); v.erase(it); |
l.pop() / l.pop(i); l.remove(x) (first match); del l[i] |
| Get |
l.get(i); |
v[i]; v.at(i) (bounds-checked) |
l[i]; negative indexing supported: l[-1] |
| Set |
l.set(i, obj); |
v[i] = t; |
l[i] = x |
| Size |
l.size() |
v.size(); v.empty(); |
len(l) |
| Iterate |
for (O o : l)
Iterator<O> it = l.iterator(); |
for (const auto& x : v)
for (size_t i = 0; i < v.size(); ++i) |
for x in l:
for i, x in enumerate(l): |
2.1.2 Fixed-size array
| Operation |
Java T[] |
C++ T[N] / std::array<T, N> |
Python (no native fixed-size) |
| Declare / Initialize |
T[] arr = new T[n];
T[] arr = new T[]{a, b, c}; |
T arr[N];
std::array<T, N> arr{a, b, c}; |
array.array('i', [1, 2, 3]) (typed but resizable)
numpy.zeros(n) for true fixed-size numeric arrays |
| Get / Set |
arr[i] / arr[i] = ...; |
arr[i] / arr[i] = ...; |
arr[i] / arr[i] = ... |
| Size |
arr.length |
std::size(arr); arr.size() for std::array |
len(arr) |
| Iterate |
for (T t : arr) |
for (auto& x : arr) |
for x in arr: |
2.1.3 Queue (FIFO)
| Operation |
Java Queue<O> (ArrayDeque<> / LinkedList<>) |
C++ std::queue<T> |
Python collections.deque |
| Declare |
Queue<O> q = new ArrayDeque<>(); |
queue<T> q; |
from collections import deque
q = deque() |
| Add (enqueue) |
q.offer(obj); (returns false on failure)
q.add(obj); (throws) |
q.push(t); q.emplace(...); |
q.append(x) |
| Remove (dequeue) |
q.poll(); (null on empty)
q.remove(); (throws) |
q.pop(); (no return value — query front() first) |
q.popleft() (raises IndexError on empty) |
| Peek |
q.peek(); / q.element(); |
q.front(); q.back(); |
q[0]; q[-1] for back |
| Size |
q.size() |
q.size(); q.empty(); |
len(q) |
| Iterate |
for (O o : q) (insertion order, no index) |
Adaptor — no iterator. Drain with while (!q.empty()) |
for x in q: (left to right) |
For thread-safe FIFO in Python, use queue.Queue (blocking, with put / get).
2.1.4 Stack (LIFO)
| Operation |
Java Deque<O> (or legacy Stack<O>) |
C++ std::stack<T> |
Python list (or collections.deque) |
| Declare |
Deque<O> s = new ArrayDeque<>(); |
stack<T> s; |
s = [] |
| Push |
s.push(obj); |
s.push(t); s.emplace(...); |
s.append(x) |
| Pop |
s.pop(); |
s.pop(); (no return — use top() first) |
s.pop() (returns the value) |
| Peek |
s.peek(); |
s.top(); |
s[-1] |
| Size |
s.size(); s.isEmpty(); |
s.size(); s.empty(); |
len(s) |
java.util.Stack is a legacy class extending Vector; new code should prefer Deque / ArrayDeque.
2.1.5 Deque (double-ended queue)
| Operation |
Java Deque<O> (ArrayDeque<>) |
C++ std::deque<T> |
Python collections.deque |
| Declare |
Deque<O> dq = new ArrayDeque<>(); |
deque<T> dq;
deque<T> dq(n); |
dq = deque()
dq = deque(maxlen=n) (auto-evicts oldest) |
| Add |
addFirst(x) / addLast(x)
offerFirst / offerLast (return false instead of throw) |
push_front(t); push_back(t);
emplace_front(...); emplace_back(...); |
dq.appendleft(x) / dq.append(x) |
| Remove |
removeFirst / removeLast
pollFirst / pollLast |
pop_front(); pop_back(); |
dq.popleft() / dq.pop() |
| Peek / Index |
peekFirst / peekLast (null on empty)
getFirst / getLast (throw) |
front(); back(); Indexed: dq[i]; dq.at(i) |
dq[0] / dq[-1]. Indexed access is O(n) for deque |
| Size |
dq.size() |
dq.size(); dq.empty(); |
len(dq) |
| Iterate |
for (O o : dq) (first to last) |
for (auto& x : dq) |
for x in dq: |
2.1.6 Priority queue (heap)
| Operation |
Java PriorityQueue<O> |
C++ std::priority_queue<T> |
Python heapq on a list |
| Declare |
PriorityQueue<O> pq = new PriorityQueue<>();
new PriorityQueue<>(cap, comparator); |
priority_queue<T> pq; (max-heap)
priority_queue<T, vector<T>, greater<T>> pq; (min-heap)
priority_queue<T, vector<T>, decltype(cmp)> pq(cmp); |
import heapq
pq = [] (any list; heapq.heapify(pq) to convert in place) For thread-safe: from queue import PriorityQueue; pq = PriorityQueue() |
| Default order |
Min-heap (natural ordering) |
Max-heap |
Min-heap |
| Push |
pq.offer(obj); |
pq.push(t); pq.emplace(...); |
heapq.heappush(pq, x) |
| Pop |
pq.poll(); |
pq.pop(); (read with top() first) |
heapq.heappop(pq) |
| Peek |
pq.peek(); |
pq.top(); |
pq[0] |
| Iterate ordered |
Iterator does not guarantee priority order. Drain with repeated poll(), or Arrays.sort(pq.toArray()) |
Drain with repeated top() then pop() |
Drain with repeated heappop. heapq.nsmallest(k, pq) / nlargest(k, pq) for top-k |
2.2 Associative Containers
2.2.1 Hash map (unordered key, value)
| Operation |
Java HashMap<K, V> |
C++ std::unordered_map<K, V> |
Python dict |
| Declare |
Map<K, V> m = new HashMap<>();
new HashMap<>(initCap, loadFactor); |
unordered_map<K, V> m;
unordered_map<K, V> m{{k1, v1}, {k2, v2}}; |
m = {}
m = {k1: v1, k2: v2}
m = dict(zip(keys, values)) |
| Add / Set |
m.put(k, v); |
m[k] = v;
m.insert({k, v});
m.emplace(k, v); |
m[k] = v
m.update({k: v}) |
| Remove |
m.remove(k); |
m.erase(k); |
del m[k]
m.pop(k) / m.pop(k, default) |
| Get / Contains |
m.get(k); (null if absent)
m.containsKey(k);
m.containsValue(v); |
m.at(k); (throws if absent)
m.find(k); (returns end iterator)
m.count(k); m.contains(k) (C++20) Note: m[k] inserts a default-constructed value if k is missing |
m[k] (raises KeyError if absent)
m.get(k) (returns None)
m.get(k, default)
k in m |
| Iterate |
for (Map.Entry<K, V> e : m.entrySet())
for (K k : m.keySet())
for (V v : m.values()) |
for (auto& [k, v] : m) (C++17 structured binding) |
for k in m:
for k, v in m.items():
for v in m.values(): |
| Size |
m.size() |
m.size(); m.empty(); |
len(m) |
2.2.2 Insertion-ordered map
| Aspect |
Java LinkedHashMap<K, V> |
C++ |
Python dict / collections.OrderedDict |
| Notes |
Same API as HashMap plus guaranteed insertion order on iteration |
N/A — no built-in. Pair an unordered map with a separate insertion-order list |
Built-in dict preserves insertion order since Python 3.7. OrderedDict adds move_to_end() and order-sensitive equality |
2.2.3 Sorted (tree) map
| Operation |
Java TreeMap<K, V> (red-black) |
C++ std::map<K, V> (red-black) |
Python (no built-in) |
| Declare |
TreeMap<K, V> tm = new TreeMap<>();
new TreeMap<>(comparator); |
map<K, V> m;
map<K, V, decltype(cmp)> m(cmp); |
Third-party: from sortedcontainers import SortedDict
sd = SortedDict() Roll-your-own: bisect over a parallel sorted list of keys |
| Ordered queries |
higherKey(k), ceilingKey(k), lowerKey(k), floorKey(k) |
m.lower_bound(k); m.upper_bound(k); |
SortedDict: irange(min, max), bisect_left/right, peekitem(i) for indexed access
bisect.bisect_left(keys, k) for lower-bound index on a parallel list |
| Iterate |
Sorted by comparator / natural ordering |
Sorted by comparator |
SortedDict iterates in sorted key order |
Other operations (add / remove / size) match the hash-map row.
2.2.4 Hash set (unordered set)
| Operation |
Java HashSet<O> |
C++ std::unordered_set<T> |
Python set |
| Declare |
Set<O> s = new HashSet<>(); |
unordered_set<T> s;
unordered_set<T> s{a, b, c}; |
s = set()
s = {a, b, c} (note: {} is an empty dict, not set)
s = set(iterable) |
| Add |
s.add(obj); |
s.insert(t); s.emplace(...); |
s.add(x); s.update(iterable) |
| Remove |
s.remove(obj); |
s.erase(t); |
s.discard(x) (silent), s.remove(x) (raises KeyError if absent) |
| Contains |
s.contains(obj); |
s.find(t); s.count(t); s.contains(t) (C++20) |
x in s |
| Set algebra |
addAll, retainAll, removeAll |
Use <algorithm> with sorted ranges, or third-party |
a | b (union), a & b (intersection), a - b (difference), a ^ b (symmetric diff) |
| Size / Iterate |
s.size(); for (O o : s) (no order) |
s.size(); for (auto& x : s) (no order) |
len(s); for x in s: (no order) |
2.2.5 Insertion-ordered set
| Aspect |
Java LinkedHashSet<O> |
C++ |
Python |
| Notes |
Same API as HashSet plus insertion-order iteration |
N/A — no built-in |
No built-in equivalent. Common idiom: dict.fromkeys(seq) for de-duplication while preserving order |
2.2.6 Sorted (tree) set
| Operation |
Java TreeSet<O> (red-black) |
C++ std::set<T> (red-black) |
Python (no built-in) |
| Declare |
TreeSet<O> ts = new TreeSet<>();
new TreeSet<>(comparator); |
set<T> s;
set<T, decltype(cmp)> s(cmp); |
Third-party: from sortedcontainers import SortedSet |
| Ordered queries |
higher, ceiling, lower, floor |
s.lower_bound(t); s.upper_bound(t); |
SortedSet: irange, bisect_left/right, direct ss[i] indexing |
| Iterate |
Sorted |
Sorted |
Sorted (SortedSet) |
2.3 Special Rules for Each Language
2.3.1 Java PriorityQueue
The iterator does not guarantee traversal in priority order. That is why Arrays.sort(pq.toArray()) may be used when ordered traversal is needed.
2.3.2 C++ unordered_map / map operator[]
m[k] on a missing key inserts a default-constructed value and returns a reference to it. Use m.at(k) (throws std::out_of_range if missing) or m.find(k) (returns end iterator if missing) for read-only access.
2.3.3 Python heapq
Only a min-heap is provided. For a max-heap, push negated values, or wrap each element in a comparison-inverting class. The "container" is just a regular list — heapify(l) reorders an existing list in place in O(n).
2.3.4 Python dict insertion order
Insertion-order preservation has been part of the language spec since 3.7. Code targeting 3.6 or older should use collections.OrderedDict.
2.3.5 Python sorted map / sorted set
The standard library has no balanced BST container. The community standard is sortedcontainers (SortedDict, SortedSet, SortedList) — pure Python, but competitive in practice. Internally it uses a list of sorted lists (chunked arrays + bisect), not a balanced tree, which is cache-friendlier than linked structures.
3. Logic and Function Grammar
3.1 Hash
| Aspect |
Java |
C++ |
Python |
| Hook(s) |
hashCode() and equals(Object) on the class |
Specialize std::hash<Key> in namespace std; provide operator== |
__hash__(self) and __eq__(self, other) on the class |
| Contract |
If a.equals(b) then a.hashCode() == b.hashCode(). Override both together |
If a == b then hash<Key>()(a) == hash<Key>()(b). Provide both |
If a == b then hash(a) == hash(b). Defining __eq__ sets __hash__ to None unless explicitly defined; instances then become unhashable |
| Convenience |
IDE auto-generation; Objects.hash(field1, field2, ...) |
boost::hash_combine for combining field hashes |
@dataclass(frozen=True) auto-generates __hash__ from the field tuple |
3.1.1 Java
class HashClass {
@Override
public int hashCode() {
return Objects.hash(field1, field2);
}
@Override
public boolean equals(Object obj) {
if (!(obj instanceof HashClass)) {
return false;
}
HashClass rhs = (HashClass) obj;
return Objects.equals(field1, rhs.field1)
&& Objects.equals(field2, rhs.field2);
}
}
3.1.2 C++
namespace std {
template <>
struct hash<Key> {
std::size_t operator()(const Key& k) const {
std::size_t h1 = std::hash<std::string>()(k.name);
std::size_t h2 = std::hash<int>()(k.id);
return h1 ^ (h2 << 1);
}
};
}
3.1.3 Python
class Key:
def __init__(self, name: str, id: int):
self.name, self.id = name, id
def __eq__(self, other: object) -> bool:
return isinstance(other, Key) and (self.name, self.id) == (other.name, other.id)
def __hash__(self) -> int:
return hash((self.name, self.id))
from dataclasses import dataclass
@dataclass(frozen=True)
class Key:
name: str
id: int
3.2 Compare
| Aspect |
Java |
C++ |
Python |
| Self-comparable type |
implements Comparable<Self> with int compareTo(Self) (negative / 0 / positive) |
Define operator< (and ideally the rest), or operator<=> (C++20 spaceship) |
Define __lt__ (and friends). @functools.total_ordering derives the rest from __eq__ plus one ordering hook |
| External comparator |
Comparator<T> returning negative / 0 / positive. Used by Collections.sort, TreeMap, PriorityQueue |
Boolean function/lambda: bool cmp(const T& a, const T& b) returning true if and only if a precedes b. Used by std::sort, std::set, std::priority_queue |
key= callable returning a sort key (preferred); or functools.cmp_to_key(fn) to convert a 3-way comparator |
| Ordering rule |
3-way: negative means a before b (ascending), 0 means equal, positive means a after b. Must be a total order |
Strict weak ordering: cmp(a, b) returns true if a strictly precedes b; must return false when a == b (else UB in std::sort) |
Sort is stable; key-extracted values compared via their natural __lt__. reverse=True flips the order |
3.2.1 Java
Comparator<Obj> myComparator = new Comparator<Obj>() {
@Override
public int compare(Obj a, Obj b) {
return Integer.compare(a.x, b.x);
}
};
list.sort(myComparator);
public class CompareClass implements Comparable<CompareClass> {
@Override
public int compareTo(CompareClass other) {
return Integer.compare(this.x, other.x);
}
}
3.2.2 C++
bool compare(const T& a, const T& b) {
return a < b;
}
std::sort(vec.begin(), vec.end(), compare);
auto cmp = [](const T& a, const T& b) { return a.x < b.x; };
std::sort(vec.begin(), vec.end(), cmp);
std::set<T, decltype(cmp)> s(cmp);
std::priority_queue<T, std::vector<T>, decltype(cmp)> pq(cmp);
The C++ comparator follows strict weak ordering: it returns true when a strictly precedes b, and must return false when a == b (otherwise std::sort has undefined behaviour).
3.2.3 Python
nums.sort()
items.sort(key=lambda x: x.score)
items.sort(key=lambda x: (x.group, -x.score))
items.sort(key=lambda x: x.score, reverse=True)
from functools import cmp_to_key
items.sort(key=cmp_to_key(lambda a, b: a.x - b.x))
from functools import total_ordering
@total_ordering
class C:
def __init__(self, x): self.x = x
def __eq__(self, other): return self.x == other.x
def __lt__(self, other): return self.x < other.x
4. Internal Mechanism
| Aspect |
Java |
C++ |
Python |
| Compilation target |
javac produces JVM bytecode (.class); the JVM JIT-compiles hot paths to native code at run time |
Compiled directly to native machine code by the toolchain (e.g. g++, clang++, MSVC) |
CPython compiles source to .pyc bytecode and runs it on a stack-based VM. PyPy adds a tracing JIT |
| Memory management |
Garbage-collected heap for all objects; primitives are stored by value (locals on the stack frame, fields embedded in their owning object) |
Stack and heap; manual new / delete, idiomatically wrapped in RAII (std::unique_ptr, std::shared_ptr) |
Reference counting plus a cyclic GC (CPython). Every value is a heap-allocated object; locals just hold references |
| Object location |
Objects are heap-allocated; variables hold references |
Objects can live on the stack, heap, or as members; variables can be values, references, or pointers |
All objects on the heap; variables are name bindings (references) |
| Generics |
Type erasure — type arguments erased at compile time, leaving raw types and casts in bytecode |
Templates are monomorphized — separate version emitted per instantiation |
Type hints are erased at runtime by default; dispatch is by actual object type (duck typing) |
| Polymorphism |
Methods virtual by default (vtable dispatch, except final / static / private) |
Non-virtual by default; virtual opts in to vtable dispatch |
All attribute lookup is dynamic (MRO walk + __dict__ lookup); no virtual keyword |
| Pointers |
No raw pointers or pointer arithmetic; references only |
Raw pointers, references (non-rebindable alias), and pointer arithmetic on arrays |
No raw pointers; everything is a reference. id() exposes object identity, is checks reference equality |
| Concurrency model |
True parallel threads; memory model defined by JLS |
True parallel threads (std::thread); memory model in the standard since C++11 |
GIL in CPython — only one thread runs Python bytecode at a time. True parallelism via multiprocessing, native extensions releasing the GIL, or experimental free-threaded builds (PEP 703, 3.13+) |
| Standard library |
JDK shipped with the JVM; consistent across platforms |
Standard library is part of the toolchain; ABI may differ between vendors |
Standard library shipped with the interpreter ("batteries included": collections, itertools, threading, asyncio, …) |
| Exception model |
Checked + unchecked exceptions; method signatures must declare checked ones |
Unchecked only; noexcept declares non-throwing functions |
All exceptions are unchecked (no static declaration). LBYL vs. EAFP idioms |
5. References