Linux File Operations and Search


  • Description: Reading files (cat, less), comparing them (cmp, diff), searching the filesystem (find), and managing symbolic vs hard links (ln). Plus the find ... -exec vs xargs distinction, which trips up most newcomers.
  • My Notion Note ID: K2B-3-4
  • Created: 2020-06-03
  • Updated: 2026-05-19
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. Reading Files: cat, less, head, tail

  • cat [OPTION]... [FILE]... — concatenate files to stdout. With no FILE, read stdin.
    • -n, --number — number all output lines.
    • -b, --number-nonblank — number only nonempty lines.
    • -A — show non-printables (-v, -E, -T combined): tabs as ^I, EOL as $, control chars as ^X.
    • -s, --squeeze-blank — collapse runs of blank lines.
  • Pitfall — cat file \| grep .... This is the canonical "useless use of cat." grep PATTERN file does the same thing without the pipe; cat is only needed to concatenate multiple files or to read stdin into a tool that only takes filenames.
  • For browsing big files use a pager instead of cat:
    • less FILE — interactive, supports /search, n/N, g/G, q, follows tail with F.
    • head -n 20 FILE / tail -n 20 FILE — first / last N lines.
    • tail -f FILE — follow appended output (log files). tail -F re-opens on rotation.

2. Output Redirection

  • cmd > file — overwrite. cmd >> file — append.
  • cmd < file — feed file as stdin.
  • cmd 2> err.log — redirect stderr (FD 2). cmd > out 2>&1 — merge stderr into stdout, then to out. cmd &> out (bash) is the same.
  • cmd1 \| cmd2 — pipe stdout of cmd1 to stdin of cmd2.
  • cmd \| tee file — split: write to file and continue down the pipeline.
  • cat > file then typing then Ctrl-D is the original "type text into a file" recipe, but printf 'line\n' > file or a real editor is cleaner.

3. Comparing Files: cmp and diff

  • cmp FILE1 FILE2byte-level comparison. Exits 0 if identical, 1 if different, 2 on error. Use on binaries.
    • -l — print every differing byte.
    • -s — silent; use in scripts that only care about the exit code.
  • diff [OPTION]... FILE1 FILE2line-level comparison. Designed for text.
    • Default output is "normal diff" (< / > markers).
    • -u, --unified[=N] — unified diff (what git diff produces). Almost always what you want.
    • -c, --context[=N] — context diff (older format).
    • -y, --side-by-side — two columns; useful for short files. -W NUM sets column width.
    • --suppress-common-lines — only show changes (with -y).
    • -r, --recursive — compare directory trees.
    • -q, --brief — only print whether files differ.
    • -i — ignore case. -w — ignore all whitespace. -B — ignore blank lines.
  • Symbols in the default and -y output:
    • < line present only in the first file
    • > line present only in the second file
    • \| line differs between the two (side-by-side)
  • For three-way merges or interactive diffs reach for diff3, vimdiff, or git diff --no-index FILE1 FILE2.

4. find — Filesystem Search

  • find [PATH...] [EXPRESSION] — walk a directory tree and apply tests. Path defaults to .; expression defaults to -print.
  • Tests (most-used):
Test Matches
-name 'glob' basename matches glob (quote it, otherwise the shell expands first)
-iname 'glob' case-insensitive -name
-path 'glob' full path matches glob
-type f/d/l/b/c/p/s file / dir / symlink / block / char / pipe / socket
-size N[ckMG] size (in 512-byte blocks by default; c=bytes, k=KiB, M=MiB, G=GiB)
-mtime ±N modified N×24h ago (-7 = within 7 days, +30 = older than 30 days)
-mmin ±N minutes instead of days
-newer FILE newer than FILE's mtime
-user NAME / -group NAME by owner / group
-perm /222 any of the write bits set
-empty empty file or dir
  • Combinators:
    • -and (implicit), -or (-o), -not (!).
    • Parentheses must be escaped or quoted: find . \( -name '*.c' -o -name '*.h' \) -print.
  • Pruning — stop descending into a subtree:
    • find . -name node_modules -prune -o -name '*.js' -print
    • The -prune must come before -o and the action must be explicit on the right side.
  • Actions:
    • -print (default), -print0 (null-terminated, safe for filenames with spaces/newlines).
    • -delete — remove matching entries. Use only after a -print dry run.
    • -exec CMD {} \; — run CMD once per file. \; ends the exec clause.
    • -exec CMD {} + — batch many files into one invocation. Faster, like xargs.
    • -ok — same as -exec but prompts before each invocation.

5. find -exec vs xargs

Two ways to feed find results into another command, with different tradeoffs.

find ... -exec CMD {} \; find ... -exec CMD {} + find ... -print0 | xargs -0 CMD
One process per file yes (slow on big trees) no, batched no, batched
Handles spaces / newlines in names yes yes yes (with -0 / -print0)
Stops on first failure no no with -x flag
Position of file arg wherever {} is end of arg list end of arg list (or with -I token)
  • Pitfall — naïve find ... \| xargs (no -print0/-0) breaks on filenames containing whitespace or quotes. Always use the null-delimited form, or just use -exec ... +.
  • xargs -P N runs N invocations in parallel — handy for embarrassingly parallel work like find . -name '*.jpg' -print0 \| xargs -0 -P 4 -n 1 mogrify -resize 800x.

Useful find -exec sh -c pattern for complex commands (referenced in the original [Y] note):

find . -name '*.log' -exec sh -c 'gzip "$1" && mv "$1.gz" /archive/' _ {} \;

The _ becomes $0 (script name placeholder); {} becomes $1. This lets you use shell features (pipes, variables, redirection) per file without quoting hell.

6. Symbolic and Hard Links

  • Hard link — a second directory entry pointing at the same inode. Indistinguishable from the original; deleting one entry doesn't free the file until the last link is gone. Limited to the same filesystem and (typically) not allowed on directories.
  • Symbolic link (symlink, soft link) — a small file whose content is a path. Can cross filesystems, can dangle (point to nothing). Behaves like a shortcut.
  • ln TARGET LINK_NAME — create a hard link.
  • ln -s TARGET LINK_NAME — create a symlink.
  • Common flags:
Flag Meaning
-s, --symbolic make a symbolic link
-f, --force remove an existing destination
-n, --no-dereference treat the destination as a normal file (don't follow if it's a symlink to a directory)
-v, --verbose print each link as created
-r, --relative compute a relative path from LINK_NAME to TARGET (cleaner symlinks)
  • Pitfall — ln -sf newtarget mylink when mylink already points to a directory silently creates mylink/<basename(newtarget)> inside the old target instead of replacing the symlink. Add -n to force replacement:
ln -sfn /a/new/path mylink

This is the classic recipe from the original [Y] note, and it stays a footgun because the default behavior is rarely what you want when re-pointing a symlink-to-a-dir.

  • Inspect symlinks: ls -l shows link -> target. readlink LINK prints the raw target; readlink -f LINK resolves the full chain to a canonical path.

7. References