define.wtf
Concepts

Search Architecture

How define.wtf's four-stage search pipeline finds the right acronyms instantly

Search Architecture

Finding the right acronym quickly is crucial. define.wtf uses a sophisticated four-stage search pipeline combining full-text search (FTS5), fuzzy matching (Levenshtein distance), client-side filtering (Fuse.js), and relevance scoring to deliver accurate results instantly.

Overview: Four-Stage Pipeline

User enters search query (e.g., "OKR")

Stage 1: Query escaping & parsing (prevent SQL injection)

Stage 2: FTS5 full-text search (SQLite index lookup)

Stage 3: Levenshtein fuzzy matching (typo tolerance)

Stage 4: Fuse.js client-side filtering (UI autocomplete)

Relevance scoring & result merging

Results displayed (primary definition first)

Stage 1: Query Escaping & Parsing

Before searching, the query is sanitized to prevent SQL injection attacks.

Query Escaping

User input is escaped to remove special SQL characters:

InputEscapedPurpose
O'KRO\'KREscape single quotes
OKR;DROP TABLEOKR\;DROP TABLEEscape semicolons
%OKR%\%OKR\%Escape wildcards

This ensures malicious input is treated as literal text, not SQL syntax.

Prefix Matching

FTS5 supports prefix matching with * operator:

User searches: "OK"
Query becomes: "OK*"
Matches: "OKR", "OKP", "OKAPI"

This allows finding results even if the user hasn't typed the full term.

Quote Handling

Multi-word searches are quoted:

User searches: "quarterly review"
Query becomes: "quarterly" AND "review"
Matches: Acronyms with both terms in definitions

SQLite's FTS5 (Full-Text Search 5) indexes all acronym terms and definitions for fast lookup.

What's Indexed?

FTS5 indexes these fields for each acronym:

FieldImportanceExample
title (term)HighOKR
descriptionHighObjectives and Key Results
synonymsMediumGoals, Quarterly Review
definition textMediumA framework for goal-setting...

BM25 Ranking

FTS5 uses BM25, a statistical ranking algorithm:

How BM25 works:

  1. Scores each result based on term frequency
  2. Boosts relevance of rare terms
  3. Penalizes overly common terms
  4. Considers document length

Example:

Query: OKR

Results (ranked by BM25):

  1. OKR (exact match in title) — score: 8.5
  2. "Objectives and Key Results" (Key Results expands to OKR) — score: 4.2
  3. "Framework for setting goals like OKR" (OKR mentioned in description) — score: 2.1

Exact title matches rank highest.

Performance

FTS5 is extremely fast:

  • Index lookups: ~1-2 ms (even with thousands of acronyms)
  • Range scan (multi-result): ~5-10 ms
  • Prefix matching: ~2-5 ms

Results are cached for repeated queries.

Stage 3: Levenshtein Fuzzy Matching

For typo tolerance, define.wtf calculates Levenshtein distance (edit distance).

Levenshtein Distance

The Levenshtein distance is the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.

Examples:

InputDictionaryDistanceMatch?
OKROKR0✓ Exact
ORKOKR1 (swap)✓ Close
OKOKR1 (deletion)✓ Close
OKRSOKR1 (insertion)✓ Close
KPIOKR2 (substitutions)✓ Maybe
BANANAOKR3+✗ Too far

Threshold

define.wtf matches results with Levenshtein distance ≤ 2:

  • Distance 0: Exact match (always shown)
  • Distance 1: Very close, likely typo (always shown)
  • Distance 2: Reasonable typo (shown with "Did you mean?" suggestion)
  • Distance 3+: Too different (filtered out)

"Did You Mean?" Suggestions

When a user's search doesn't match anything but is close, they see:

No results for "ORK"

Did you mean: OKR

This suggestion uses distance ≤ 2 matches.

Performance

Levenshtein matching is computationally expensive, so it's used strategically:

  • Only when FTS5 returns few or no results
  • Only on acronym titles (not on full definitions)
  • Cached for common typos

For autocomplete and command palette search, results are filtered client-side using Fuse.js.

What is Fuse.js?

Fuse.js is a fuzzy search library that runs in your browser. It searches a pre-fetched list of acronyms without additional API calls.

How It Works

  1. Server returns candidates — API returns ~200 most relevant acronyms
  2. Browser downloads Fuse.js — Lightweight library (15KB gzipped)
  3. Browser creates index — Builds search index from results
  4. User types queryfuzzySearch(query) filters results instantly
  5. Results update live — No additional API calls needed

Fuse.js Configuration

Fuse.js is configured to:

  • Search acronym titles with high weight (importance)
  • Search definitions with lower weight
  • Use fuzzy matching with threshold 0.6 (60% similarity)
  • Sort by score (best matches first)

Example:

const fuse = new Fuse(acronyms, {
  keys: [
    { name: 'title', weight: 1.0 },      // Highest priority
    { name: 'description', weight: 0.5 }  // Lower priority
  ],
  threshold: 0.6,  // 60% similarity required
  includeScore: true
});

const results = fuse.search('okr');
// Returns: [OKR (score 0.95), OKP (score 0.85), ...]

Performance

Client-side search is extremely fast:

  • Index creation: ~50-100 ms (one-time)
  • Search execution: ~1-5 ms per query
  • Results update: Instant (no network round-trip)

Perfect for command palette and autocomplete where the user expects instant feedback.

Result Merging & Ranking

Results from all stages are merged and ranked:

Ranking Factors

  1. Exact match (highest priority)

    • Query exactly matches acronym title
    • Rank boost: +1000 points
  2. Title match (very high)

    • Query found in acronym title
    • BM25 score applied
  3. Definition match (high)

    • Query found in definition text
    • BM25 score applied
  4. Fuzzy match (medium)

    • Query is similar (Levenshtein distance ≤ 2)
    • Distance affects rank (distance 1 > distance 2)
  5. Popular acronyms (tie-breaker)

    • More upvotes → ranked higher
    • Recency → recently updated ranked higher

Final Sort Order

Results are sorted by:

1. Relevance score (calculated from above factors)
2. Vote count (more upvoted first)
3. Recency (recently updated first)
4. Alphabetical (tie-breaker)

Query Types

Query: "OKR"
Matching:
  - Title: "OKR"
  - Synonyms: "Quarterly Goals"
  - Definition: "...a framework for goal setting..."
Query: "Objectives and Key"
Matching:
  - Title: "OKR" (expands to "Objectives and Key Results")
  - Definition: "An objectives-focused framework"
Query: "ORK" (typo of "OKR")
Stage 2: FTS5 returns no results
Stage 3: Levenshtein finds "OKR" (distance 1)
Result: "Did you mean: OKR"
Query: "goal setting framework"
FTS5 matches:
  - "An objective goal-setting framework"
  - "Framework for setting goals"
Result: Both returned, ranked by relevance

Performance Characteristics

Latency

Query TypeLatencyNotes
Exact match1-2 msFTS5 index lookup
Prefix match2-5 msFTS5 range scan
Fuzzy match10-50 msLevenshtein distance calculation
Client-side (Fuse.js)1-5 msIn-browser processing

Throughput

  • Server can handle 1,000+ searches per second
  • Caching reduces repeated query latency to <1 ms

Scalability

  • 10,000 acronyms: Still responsive (<100 ms)
  • 100,000 acronyms: Still responsive but may need pagination
  • 1,000,000 acronyms: Requires advanced caching strategies

Search Limitations

What Search Does NOT Do

  • Partial middle matching — Searching KR doesn't match OKR (only prefix/full-term matching)
  • Boolean operatorsOKR AND KPI not supported (treated as phrase search)
  • Regular expressions — Not supported for performance reasons
  • Case-sensitive matching — All searches are case-insensitive
  • Accent matching — Accented characters are normalized

Workarounds

  • For partial matching — Add synonyms with partial words to the description
  • For complex queries — Use the UI filters instead of search
  • For regex matching — Export data and use external tools

As an Admin

  1. Use clear acronym names — "OKR" beats "OKRF" for clarity
  2. Write descriptive definitions — Include synonyms and context
  3. Add tags and categories — Helps filtering even if search doesn't find it
  4. Create collections — Organize related acronyms by topic
  5. Keep definitions updated — Stale definitions hurt search quality

As a User

  1. Search by acronym first — Much faster than searching definitions
  2. Use filters — Categories and tags narrow results efficiently
  3. Try prefix search — "OKR" faster than "Objectives"
  4. Check synonyms — Different teams may use different names

Advanced: Search Configuration

Caching Strategy

define.wtf caches search results:

Cache LevelDurationInvalidation
Redis (server)1 hourOn acronym change
BrowserSessionOn page refresh
FTS5 indexPersistentReal-time updates

Cache invalidation ensures results stay fresh while maintaining speed.

Index Maintenance

FTS5 indexes are:

  • Rebuilt on acronym create/edit/delete
  • Rebuilt nightly (full refresh for consistency)
  • Checked for corruption weekly

To manually rebuild indexes:

npm run db:rebuild-fts

(Admin only)

See Also