SEO for Google and International Search


  • Description: Webmaster-side workflow for Google, Bing, and other international engines — Google Search Console setup (domain vs URL prefix property, TXT-record verification via Cloudflare, sitemap submission, URL Inspection, Coverage/Performance reports, Crawl stats), Bing Webmaster Tools (import from GSC + IndexNow), Yandex Webmaster briefly, and the hreflang strategy for English-default bilingual sites.
  • My Notion Note ID: K2B-8-3
  • Created: 2026-05-23
  • Updated: 2026-05-23
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. Engine Landscape Outside China

For an English-default personal/tech site targeting non-mainland audiences:

Engine Approx. share (en, 2026) Webmaster console Notes
Google ~89% global, ~87% en-US Google Search Console The only one you can't ignore.
Bing ~4% global, ~7% en-US (incl. ChatGPT/Copilot citations) Bing Webmaster Tools Imports GSC; effectively free coverage.
DuckDuckGo ~1-2% none Pulls primarily from Bing's index.
Yahoo ~1% none (deprecated) Pulls from Bing.
Yandex dominant in RU, ~0.5% global Yandex Webmaster Worth setting up if RU audience matters.
Brave Search <1%, growing none Independent index since 2022.

Practical rule: optimize for Google + Bing (via GSC + BWT). Everything else either piggybacks on Bing or doesn't move the needle.

2. Google Search Console — Setup

URL: https://search.google.com/search-console

2.1 Property type

Two options:

  • Domain property — covers apex + all subdomains + both HTTP/HTTPS. Requires a DNS TXT record. Preferred.
  • URL prefix property — covers exactly one URL (e.g. https://www.example.com/). Verifies via HTML file upload, HTML meta tag, Google Analytics, or Google Tag Manager. Faster but narrower.

Domain property wins because one verification covers example.com, www.example.com, future blog.example.com, etc.

2.2 Verification via Cloudflare

Two paths — the auto-integration is dramatically easier.

When you enter a Cloudflare-managed domain in Search Console's Domain-property setup, GSC detects the DNS provider and offers an Authorize button instead of a manual TXT recipe. Flow:

  • Click Authorize → redirected to Cloudflare → grants Google permission to add the TXT
  • Cloudflare auto-creates the TXT record (no manual DNS editing)
  • GSC verifies in seconds → "Ownership verified"

The TXT it adds is dedicated to verification and doesn't touch other DNS records (A/CNAME for Vercel, MX for email, etc.). Safe to leave in place — if removed, ownership lapses.

2.2.2 Manual TXT record (fallback)

If Cloudflare integration isn't offered (older zones, missing permission, non-Cloudflare DNS):

Type:    TXT
Name:    @
Content: google-site-verification=<token-from-search-console>
TTL:     Auto

Wait ~1-10 minutes for propagation, click Verify in Search Console. Cloudflare is fast → usually under 1 minute.

2.2.3 Backup verification

Cloudflare auto-TXT is the sole verification method when you use § 2.2.1. If the integration breaks or the TXT is accidentally deleted, ownership lapses. Add a backup via Settings → Ownership verification → Add another method → HTML tag. Save the token to NEXT_PUBLIC_GOOGLE_VERIFICATION env var; Next.js layout's metadata.verification.google reads it.

2.3 First-day actions

  • Sitemaps → submission rules differ by property type:
    • URL-prefix property → relative path sitemap.xml works.
    • Domain property → must submit the full URL https://example.com/sitemap.xml. Submitting the relative form returns "Invalid sitemap address" (the property covers all subdomains so GSC needs to know which one).
    • Status often shows "Couldn't fetch" on first submission for new sites — Google fetches asynchronously, typically resolves to "Success" within 1-24h. If still stuck after a day, hit Retry.
  • URL Inspection → paste the home page URL. Status will likely be "URL is not on Google" for a new site. Click Request indexing → manual queue, processed in ~1-7 days. Daily quota ~10-20 requests; prioritize home + top sections, let the rest come via sitemap.
  • Settings → Ownership verification → confirm the TXT record is recorded; add a backup verification method (§ 2.2.3) if you used Cloudflare auto-integration.

3. Google Search Console — Day-to-Day

3.1 Coverage / Pages report

  • Shows which URLs are indexed, not indexed (reasons given), and excluded (with reasons).
  • Common "not indexed" reasons + their fixes:
    • Discovered – currently not indexed → Google found the URL but hasn't crawled it. Build inbound links; wait.
    • Crawled – currently not indexed → Google fetched but chose not to index. Usually quality/duplicate signal. Improve content, add structured data, check for near-duplicates elsewhere on the web.
    • Duplicate without user-selected canonical → set alternates.canonical in Next.js metadata.
    • Page with redirect → expected for redirected URLs; only a problem if the canonical URL itself isn't indexed.
    • Soft 404 → page returns 200 but looks empty/error-ish. Add real content or return actual 404.
    • Blocked by robots.txt → check app/robots.ts. The classic "I disallowed / in dev and forgot" bug.

3.2 Performance report

  • Queries tab → what searches your pages appear for (impressions, clicks, position).
  • Pages tab → which URLs got impressions/clicks.
  • Filter by country, device, search appearance, date range.
  • Empty for the first 1-2 weeks. Useful only after enough impressions accumulate.
  • For sites with substantial content, Google sometimes shows a search box directly in your SERP entry. Add WebSite + SearchAction JSON-LD to opt in:
const jsonLd = {
  '@context': 'https://schema.org',
  '@type': 'WebSite',
  url: 'https://example.com',
  potentialAction: {
    '@type': 'SearchAction',
    target: 'https://example.com/search?q={search_term_string}',
    'query-input': 'required name=search_term_string',
  },
};

4. Google Search Console — Diagnostics

  • Settings → Crawl stats → per-day request counts, response codes, file types (HTML, image, JS, CSS), and fetched-by-purpose (refresh, discovery). Drop in crawl rate after a deploy usually = newly returning 4xx or slow responses.
  • URL Inspection → Test live URL → forces Googlebot to fetch the page right now and shows the rendered HTML it sees. Use this when "indexed" status disagrees with what you expect.
  • Removals tool → emergency removal of a URL from SERP (24-hour effect; permanent removal still needs the page to 404/410 or have noindex).
  • Security & Manual actions → almost always empty for personal sites, but check after any compromise.
  • Core Web Vitals report → per-URL LCP / INP / CLS scores. Fix the URLs flagged as "Poor" first.

4.1 robots.txt and rendering tests

Older robots.txt Tester (google.com/webmasters/tools/robots-testing-tool) still works for testing whether a path is blocked. URL Inspection is now the primary entry point.

For JS-heavy pages, use URL Inspection → Test live URL → View tested page → HTML to confirm Googlebot sees server-rendered HTML (Next.js App Router with default SSR is fine). If you see a near-empty <body>, Googlebot is seeing only the JS shell — fix by ensuring SSR/SSG is enabled for that route.

5. Bing Webmaster Tools

URL: https://www.bing.com/webmasters

5.1 Why bother

  • Powers DuckDuckGo, Yahoo, and large chunks of ChatGPT / Copilot / Perplexity citation lookups.
  • Setup time: ~2 minutes via GSC import.

5.2 Setup via GSC import

If GSC is already verified:

  • Bing Webmaster Tools → Import from Google Search Console → sign in with the same Google account → select properties → import.
  • Sites and sitemaps copy across automatically.

Standalone setup is also available (XML file upload, meta tag, or CNAME).

5.3 What Bing offers beyond GSC

  • Site Explorer → similar to GSC Coverage but with more aggressive crawl-error surfacing.
  • Search Performance → query/page reports comparable to GSC Performance.
  • URL submission → 10/day for manual submit, up to 10,000/day via IndexNow (see § 6).
  • Backlinks explorer → richer link data than GSC offers for free.
  • Markup Validator → tests structured data including OpenGraph and Twitter Card.

6. IndexNow — Faster Crawl Triggers

IndexNow is a protocol Bing started (Yandex now supports too) for proactively pinging engines on content change.

6.1 How it works

  • Generate an API key (any UUID).
  • Host https://example.com/<key>.txt containing just the key — Bing verifies you own the domain.
  • POST to https://api.indexnow.org/indexnow with a JSON body listing changed URLs:
curl -X POST 'https://api.indexnow.org/indexnow' \
  -H 'Content-Type: application/json' \
  -d '{
    "host": "example.com",
    "key": "<your-key>",
    "urlList": ["https://example.com/notes/new-note"]
  }'
  • One ping is shared across all participating engines (Bing, Yandex, Seznam, others) — no per-engine submission needed.

6.2 Wiring into the build

Add a vercel.json or GitHub Action that POSTs to IndexNow after deployment. Sketch:

# .github/workflows/indexnow.yml
on: { workflow_run: { workflows: ['Vercel Production Deploy'], types: [completed] } }
jobs:
  ping:
    runs-on: ubuntu-latest
    steps:
      - run: |
          curl -X POST 'https://api.indexnow.org/indexnow' \
            -H 'Content-Type: application/json' \
            -d '{"host":"example.com","key":"${{ secrets.INDEXNOW_KEY }}","urlList":["https://example.com/"]}'

For finer granularity, generate the URL list from changed files in the deploy commit.

Google does not support IndexNow as of 2026 — Google's only public signal is the GSC URL Inspection "Request indexing" button.

7. Yandex, DuckDuckGo, Brave

  • Yandex Webmaster (webmaster.yandex.com) — register if you care about Russian traffic. Setup is similar to GSC: TXT or HTML verification, submit sitemap, monitor index status. IndexNow integration works.
  • DuckDuckGo — no webmaster console. DDG sources primarily from Bing's index; getting indexed in Bing covers DDG.
  • Brave Search — independent index. No webmaster console as of 2026. They claim to index based on Common Crawl + their own crawler; no submission API.
  • Kagi — paid search, no webmaster tools, no submission.

Tactically: if you're shipping IndexNow (§ 6) you already reach Bing + Yandex. Brave + Kagi are organic discovery only.

8. International / hreflang Strategy

For a bilingual EN / ZH site that wants both Google and Baidu coverage:

  • One canonical URL per page (e.g. /notes/cpp/templates).
  • Language switcher in the page UI sets a ?lang=en or ?lang=zh param (or cookie / localStorage).
  • hreflang annotations tell Google which version to serve per user locale:
export const metadata: Metadata = {
  alternates: {
    canonical: '/notes/cpp/templates',
    languages: {
      'en-US': '/notes/cpp/templates?lang=en',
      'zh-CN': '/notes/cpp/templates?lang=zh',
      'x-default': '/notes/cpp/templates',
    },
  },
};
  • x-default tells Google what to show when no language matches — for tech notes, usually the English version.
  • For Google + Bing this is enough. Baidu ignores hreflang — for serious Baidu ranking you need separately optimized zh-CN pages, see SEO Baidu And China.

8.2 Locale subpaths (alternative)

  • /en/notes/cpp/templates and /zh/notes/cpp/templates — two separate URL spaces.
  • More SEO juice per language (Google treats each as a fully independent page).
  • Significantly more routing/sitemap complexity. Worth it only if you're committing to substantial per-language content divergence.

8.3 Locale-specific domains

  • example.com (en) + example.cn (zh) — separate domains.
  • Strongest geo/language signal but doubles infrastructure. Required only when one locale is a separate business.

For a personal tech site, § 8.1 (same URL + language switcher) is the right default.

9. Backlink Building for Tech Audiences

Discovery accelerates with inbound links. Practical channels for English-speaking tech readers:

  • Hacker News — submit a single high-quality post when launching. Don't farm.
  • lobste.rs — invite-only, but submitted links by members rank well.
  • r/programming, r/cpp, r/javascript, etc. — domain-relevant subreddits.
  • dev.to / hashnode — republish (with canonical pointing to your site) for an extra discovery surface without duplicate-content cost.
  • GitHub README — link from any popular repos you maintain.
  • Conference talk slides — link to backing notes.
  • Newsletters — guest posts or curated mentions (e.g., Bytes, Pointer, Console).

Each genuine inbound link is worth weeks of organic crawl-rate improvement. Avoid link farms, automated comment posting, and paid-link networks — those trigger manual actions in GSC.

10. References