Search Architecture

Finding the right acronym quickly is crucial. define.wtf uses a sophisticated four-stage search pipeline combining full-text search (FTS5), fuzzy matching (Levenshtein distance), client-side filtering (Fuse.js), and relevance scoring to deliver accurate results instantly.

Overview: Four-Stage Pipeline

User enters search query (e.g., "OKR")
  ↓
Stage 1: Query escaping & parsing (prevent SQL injection)
  ↓
Stage 2: FTS5 full-text search (SQLite index lookup)
  ↓
Stage 3: Levenshtein fuzzy matching (typo tolerance)
  ↓
Stage 4: Fuse.js client-side filtering (UI autocomplete)
  ↓
Relevance scoring & result merging
  ↓
Results displayed (primary definition first)

Stage 1: Query Escaping & Parsing

Before searching, the query is sanitized to prevent SQL injection attacks.

Query Escaping

User input is escaped to remove special SQL characters:

Input	Escaped	Purpose
`O'KR`	`O\'KR`	Escape single quotes
`OKR;DROP TABLE`	`OKR\;DROP TABLE`	Escape semicolons
`%OKR%`	`\%OKR\%`	Escape wildcards

This ensures malicious input is treated as literal text, not SQL syntax.

Prefix Matching

FTS5 supports prefix matching with * operator:

User searches: "OK"
Query becomes: "OK*"
Matches: "OKR", "OKP", "OKAPI"

This allows finding results even if the user hasn't typed the full term.

Quote Handling

Multi-word searches are quoted:

User searches: "quarterly review"
Query becomes: "quarterly" AND "review"
Matches: Acronyms with both terms in definitions

Stage 2: FTS5 Full-Text Search

SQLite's FTS5 (Full-Text Search 5) indexes all acronym terms and definitions for fast lookup.

What's Indexed?

FTS5 indexes these fields for each acronym:

Field	Importance	Example
`title` (term)	High	OKR
`description`	High	Objectives and Key Results
`synonyms`	Medium	Goals, Quarterly Review
`definition text`	Medium	A framework for goal-setting...

BM25 Ranking

FTS5 uses BM25, a statistical ranking algorithm:

How BM25 works:

Scores each result based on term frequency
Boosts relevance of rare terms
Penalizes overly common terms
Considers document length

Example:

Query: OKR

Results (ranked by BM25):

OKR (exact match in title) — score: 8.5
"Objectives and Key Results" (Key Results expands to OKR) — score: 4.2
"Framework for setting goals like OKR" (OKR mentioned in description) — score: 2.1

Exact title matches rank highest.

Performance

FTS5 is extremely fast:

Index lookups: ~1-2 ms (even with thousands of acronyms)
Range scan (multi-result): ~5-10 ms
Prefix matching: ~2-5 ms

Results are cached for repeated queries.

Stage 3: Levenshtein Fuzzy Matching

For typo tolerance, define.wtf calculates Levenshtein distance (edit distance).

Levenshtein Distance

The Levenshtein distance is the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.

Examples:

Input	Dictionary	Distance	Match?
`OKR`	`OKR`	0	✓ Exact
`ORK`	`OKR`	1 (swap)	✓ Close
`OK`	`OKR`	1 (deletion)	✓ Close
`OKRS`	`OKR`	1 (insertion)	✓ Close
`KPI`	`OKR`	2 (substitutions)	✓ Maybe
`BANANA`	`OKR`	3+	✗ Too far

Threshold

define.wtf matches results with Levenshtein distance ≤ 2:

Distance 0: Exact match (always shown)
Distance 1: Very close, likely typo (always shown)
Distance 2: Reasonable typo (shown with "Did you mean?" suggestion)
Distance 3+: Too different (filtered out)

"Did You Mean?" Suggestions

When a user's search doesn't match anything but is close, they see:

No results for "ORK"

Did you mean: OKR

This suggestion uses distance ≤ 2 matches.

Performance

Levenshtein matching is computationally expensive, so it's used strategically:

Only when FTS5 returns few or no results
Only on acronym titles (not on full definitions)
Cached for common typos

Stage 4: Fuse.js Client-Side Search

For autocomplete and command palette search, results are filtered client-side using Fuse.js.

What is Fuse.js?

Fuse.js is a fuzzy search library that runs in your browser. It searches a pre-fetched list of acronyms without additional API calls.

How It Works

Server returns candidates — API returns ~200 most relevant acronyms
Browser downloads Fuse.js — Lightweight library (15KB gzipped)
Browser creates index — Builds search index from results
User types query — fuzzySearch(query) filters results instantly
Results update live — No additional API calls needed

Fuse.js Configuration

Fuse.js is configured to:

Search acronym titles with high weight (importance)
Search definitions with lower weight
Use fuzzy matching with threshold 0.6 (60% similarity)
Sort by score (best matches first)

Example:

const fuse = new Fuse(acronyms, {
  keys: [
    { name: 'title', weight: 1.0 },      // Highest priority
    { name: 'description', weight: 0.5 }  // Lower priority
  ],
  threshold: 0.6,  // 60% similarity required
  includeScore: true
});

const results = fuse.search('okr');
// Returns: [OKR (score 0.95), OKP (score 0.85), ...]

Performance

Client-side search is extremely fast:

Index creation: ~50-100 ms (one-time)
Search execution: ~1-5 ms per query
Results update: Instant (no network round-trip)

Perfect for command palette and autocomplete where the user expects instant feedback.

Result Merging & Ranking

Results from all stages are merged and ranked:

Ranking Factors

Exact match (highest priority)
- Query exactly matches acronym title
- Rank boost: +1000 points
Title match (very high)
- Query found in acronym title
- BM25 score applied
Definition match (high)
- Query found in definition text
- BM25 score applied
Fuzzy match (medium)
- Query is similar (Levenshtein distance ≤ 2)
- Distance affects rank (distance 1 > distance 2)
Popular acronyms (tie-breaker)
- More upvotes → ranked higher
- Recency → recently updated ranked higher

Final Sort Order

Results are sorted by:

1. Relevance score (calculated from above factors)
2. Vote count (more upvoted first)
3. Recency (recently updated first)
4. Alphabetical (tie-breaker)

Query Types

Single Term Search

Query: "OKR"
Matching:
  - Title: "OKR"
  - Synonyms: "Quarterly Goals"
  - Definition: "...a framework for goal setting..."

Multi-Word Search

Query: "Objectives and Key"
Matching:
  - Title: "OKR" (expands to "Objectives and Key Results")
  - Definition: "An objectives-focused framework"

Typo Search

Query: "ORK" (typo of "OKR")
Stage 2: FTS5 returns no results
Stage 3: Levenshtein finds "OKR" (distance 1)
Result: "Did you mean: OKR"

Phrase Search

Query: "goal setting framework"
FTS5 matches:
  - "An objective goal-setting framework"
  - "Framework for setting goals"
Result: Both returned, ranked by relevance

Performance Characteristics

Latency

Query Type	Latency	Notes
Exact match	1-2 ms	FTS5 index lookup
Prefix match	2-5 ms	FTS5 range scan
Fuzzy match	10-50 ms	Levenshtein distance calculation
Client-side (Fuse.js)	1-5 ms	In-browser processing

Throughput

Server can handle 1,000+ searches per second
Caching reduces repeated query latency to <1 ms

Scalability

10,000 acronyms: Still responsive (<100 ms)
100,000 acronyms: Still responsive but may need pagination
1,000,000 acronyms: Requires advanced caching strategies

Search Limitations

What Search Does NOT Do

Partial middle matching — Searching KR doesn't match OKR (only prefix/full-term matching)
Boolean operators — OKR AND KPI not supported (treated as phrase search)
Regular expressions — Not supported for performance reasons
Case-sensitive matching — All searches are case-insensitive
Accent matching — Accented characters are normalized