Search Architecture
How define.wtf's four-stage search pipeline finds the right acronyms instantly
Search Architecture
Finding the right acronym quickly is crucial. define.wtf uses a sophisticated four-stage search pipeline combining full-text search (FTS5), fuzzy matching (Levenshtein distance), client-side filtering (Fuse.js), and relevance scoring to deliver accurate results instantly.
Overview: Four-Stage Pipeline
User enters search query (e.g., "OKR")
↓
Stage 1: Query escaping & parsing (prevent SQL injection)
↓
Stage 2: FTS5 full-text search (SQLite index lookup)
↓
Stage 3: Levenshtein fuzzy matching (typo tolerance)
↓
Stage 4: Fuse.js client-side filtering (UI autocomplete)
↓
Relevance scoring & result merging
↓
Results displayed (primary definition first)Stage 1: Query Escaping & Parsing
Before searching, the query is sanitized to prevent SQL injection attacks.
Query Escaping
User input is escaped to remove special SQL characters:
| Input | Escaped | Purpose |
|---|---|---|
O'KR | O\'KR | Escape single quotes |
OKR;DROP TABLE | OKR\;DROP TABLE | Escape semicolons |
%OKR% | \%OKR\% | Escape wildcards |
This ensures malicious input is treated as literal text, not SQL syntax.
Prefix Matching
FTS5 supports prefix matching with * operator:
User searches: "OK"
Query becomes: "OK*"
Matches: "OKR", "OKP", "OKAPI"This allows finding results even if the user hasn't typed the full term.
Quote Handling
Multi-word searches are quoted:
User searches: "quarterly review"
Query becomes: "quarterly" AND "review"
Matches: Acronyms with both terms in definitionsStage 2: FTS5 Full-Text Search
SQLite's FTS5 (Full-Text Search 5) indexes all acronym terms and definitions for fast lookup.
What's Indexed?
FTS5 indexes these fields for each acronym:
| Field | Importance | Example |
|---|---|---|
title (term) | High | OKR |
description | High | Objectives and Key Results |
synonyms | Medium | Goals, Quarterly Review |
definition text | Medium | A framework for goal-setting... |
BM25 Ranking
FTS5 uses BM25, a statistical ranking algorithm:
How BM25 works:
- Scores each result based on term frequency
- Boosts relevance of rare terms
- Penalizes overly common terms
- Considers document length
Example:
Query: OKR
Results (ranked by BM25):
- OKR (exact match in title) — score: 8.5
- "Objectives and Key Results" (Key Results expands to OKR) — score: 4.2
- "Framework for setting goals like OKR" (OKR mentioned in description) — score: 2.1
Exact title matches rank highest.
Performance
FTS5 is extremely fast:
- Index lookups: ~1-2 ms (even with thousands of acronyms)
- Range scan (multi-result): ~5-10 ms
- Prefix matching: ~2-5 ms
Results are cached for repeated queries.
Stage 3: Levenshtein Fuzzy Matching
For typo tolerance, define.wtf calculates Levenshtein distance (edit distance).
Levenshtein Distance
The Levenshtein distance is the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.
Examples:
| Input | Dictionary | Distance | Match? |
|---|---|---|---|
OKR | OKR | 0 | ✓ Exact |
ORK | OKR | 1 (swap) | ✓ Close |
OK | OKR | 1 (deletion) | ✓ Close |
OKRS | OKR | 1 (insertion) | ✓ Close |
KPI | OKR | 2 (substitutions) | ✓ Maybe |
BANANA | OKR | 3+ | ✗ Too far |
Threshold
define.wtf matches results with Levenshtein distance ≤ 2:
- Distance 0: Exact match (always shown)
- Distance 1: Very close, likely typo (always shown)
- Distance 2: Reasonable typo (shown with "Did you mean?" suggestion)
- Distance 3+: Too different (filtered out)
"Did You Mean?" Suggestions
When a user's search doesn't match anything but is close, they see:
No results for "ORK"
Did you mean: OKRThis suggestion uses distance ≤ 2 matches.
Performance
Levenshtein matching is computationally expensive, so it's used strategically:
- Only when FTS5 returns few or no results
- Only on acronym titles (not on full definitions)
- Cached for common typos
Stage 4: Fuse.js Client-Side Search
For autocomplete and command palette search, results are filtered client-side using Fuse.js.
What is Fuse.js?
Fuse.js is a fuzzy search library that runs in your browser. It searches a pre-fetched list of acronyms without additional API calls.
How It Works
- Server returns candidates — API returns ~200 most relevant acronyms
- Browser downloads Fuse.js — Lightweight library (15KB gzipped)
- Browser creates index — Builds search index from results
- User types query —
fuzzySearch(query)filters results instantly - Results update live — No additional API calls needed
Fuse.js Configuration
Fuse.js is configured to:
- Search acronym titles with high weight (importance)
- Search definitions with lower weight
- Use fuzzy matching with threshold 0.6 (60% similarity)
- Sort by score (best matches first)
Example:
const fuse = new Fuse(acronyms, {
keys: [
{ name: 'title', weight: 1.0 }, // Highest priority
{ name: 'description', weight: 0.5 } // Lower priority
],
threshold: 0.6, // 60% similarity required
includeScore: true
});
const results = fuse.search('okr');
// Returns: [OKR (score 0.95), OKP (score 0.85), ...]Performance
Client-side search is extremely fast:
- Index creation: ~50-100 ms (one-time)
- Search execution: ~1-5 ms per query
- Results update: Instant (no network round-trip)
Perfect for command palette and autocomplete where the user expects instant feedback.
Result Merging & Ranking
Results from all stages are merged and ranked:
Ranking Factors
-
Exact match (highest priority)
- Query exactly matches acronym title
- Rank boost: +1000 points
-
Title match (very high)
- Query found in acronym title
- BM25 score applied
-
Definition match (high)
- Query found in definition text
- BM25 score applied
-
Fuzzy match (medium)
- Query is similar (Levenshtein distance ≤ 2)
- Distance affects rank (distance 1 > distance 2)
-
Popular acronyms (tie-breaker)
- More upvotes → ranked higher
- Recency → recently updated ranked higher
Final Sort Order
Results are sorted by:
1. Relevance score (calculated from above factors)
2. Vote count (more upvoted first)
3. Recency (recently updated first)
4. Alphabetical (tie-breaker)Query Types
Single Term Search
Query: "OKR"
Matching:
- Title: "OKR"
- Synonyms: "Quarterly Goals"
- Definition: "...a framework for goal setting..."Multi-Word Search
Query: "Objectives and Key"
Matching:
- Title: "OKR" (expands to "Objectives and Key Results")
- Definition: "An objectives-focused framework"Typo Search
Query: "ORK" (typo of "OKR")
Stage 2: FTS5 returns no results
Stage 3: Levenshtein finds "OKR" (distance 1)
Result: "Did you mean: OKR"Phrase Search
Query: "goal setting framework"
FTS5 matches:
- "An objective goal-setting framework"
- "Framework for setting goals"
Result: Both returned, ranked by relevancePerformance Characteristics
Latency
| Query Type | Latency | Notes |
|---|---|---|
| Exact match | 1-2 ms | FTS5 index lookup |
| Prefix match | 2-5 ms | FTS5 range scan |
| Fuzzy match | 10-50 ms | Levenshtein distance calculation |
| Client-side (Fuse.js) | 1-5 ms | In-browser processing |
Throughput
- Server can handle 1,000+ searches per second
- Caching reduces repeated query latency to <1 ms
Scalability
- 10,000 acronyms: Still responsive (<100 ms)
- 100,000 acronyms: Still responsive but may need pagination
- 1,000,000 acronyms: Requires advanced caching strategies
Search Limitations
What Search Does NOT Do
- Partial middle matching — Searching
KRdoesn't matchOKR(only prefix/full-term matching) - Boolean operators —
OKR AND KPInot supported (treated as phrase search) - Regular expressions — Not supported for performance reasons
- Case-sensitive matching — All searches are case-insensitive
- Accent matching — Accented characters are normalized
Workarounds
- For partial matching — Add synonyms with partial words to the description
- For complex queries — Use the UI filters instead of search
- For regex matching — Export data and use external tools
Optimization Tips for Search
As an Admin
- Use clear acronym names — "OKR" beats "OKRF" for clarity
- Write descriptive definitions — Include synonyms and context
- Add tags and categories — Helps filtering even if search doesn't find it
- Create collections — Organize related acronyms by topic
- Keep definitions updated — Stale definitions hurt search quality
As a User
- Search by acronym first — Much faster than searching definitions
- Use filters — Categories and tags narrow results efficiently
- Try prefix search — "OKR" faster than "Objectives"
- Check synonyms — Different teams may use different names
Advanced: Search Configuration
Caching Strategy
define.wtf caches search results:
| Cache Level | Duration | Invalidation |
|---|---|---|
| Redis (server) | 1 hour | On acronym change |
| Browser | Session | On page refresh |
| FTS5 index | Persistent | Real-time updates |
Cache invalidation ensures results stay fresh while maintaining speed.
Index Maintenance
FTS5 indexes are:
- Rebuilt on acronym create/edit/delete
- Rebuilt nightly (full refresh for consistency)
- Checked for corruption weekly
To manually rebuild indexes:
npm run db:rebuild-fts(Admin only)
See Also
- Concepts: Multi-Tenancy — Search is tenant-scoped
- Admin Guide: Bulk Import — Bulk add acronyms
- API Reference: Search — Search API endpoints