Content Moderation
How define.wtf automatically filters inappropriate content while maintaining team safety
Content Moderation
define.wtf includes automated content moderation to keep your workspace professional and safe. Every user-editable text field is scanned for profanity, offensive language, and leetspeak evasion attempts.
Overview
What Gets Moderated?
Content is automatically scanned on 13 API routes across 21 fields:
| Content Type | Fields Scanned | Examples |
|---|---|---|
| Acronyms | Title, Description | "OKR", "Objectives and Key Results" |
| Definitions | Text, Notes | Definition body, admin notes |
| Categories | Name, Description | "Finance", "Sales Strategy" |
| Collections | Name, Description | "Q4 Planning", "Project Alpha" |
| Tags | Name | "important", "urgent" |
| User Profiles | Name, Bio | Display name, user biography |
When Moderation Happens
- On create — New acronym, definition, category, etc.
- On edit — When user updates any text field
- On import — During bulk CSV import
- Real-time — No delay; submission blocked if content violates policy
What Gets Blocked?
- Explicit profanity — Industry-standard blocklist of offensive words
- Offensive slurs — Category-specific blocklists (ethnic, gender, orientation, etc.)
- Leetspeak variants —
h3ll0,@$$hole,f*ck, etc. - Unicode evasion — Lookalike characters (
1/,¡,ɪ) used to bypass filters - Custom blocklist — Per-tenant custom blocked words
Moderation Engine: Obscenity Library
define.wtf uses the obscenity library, a comprehensive Node.js profanity filter:
Why Obscenity?
- Detects leetspeak and unicode evasion automatically
- Extendable with custom blocklists
- Performance-optimized (scans thousands of words in <10ms)
- Maintains fairness (no false positives on legitimate tech terms)
Standard Blocklists
Obscenity includes these default blocklists:
- en/core — Common English profanity
- en/en_US — US-specific offensive language
- en/en_GB — UK-specific offensive language
Total: ~400 words and variants covered.
Content Moderation Behavior
When Content Violates Policy
User submits content with blocked word:
User submits: "Check out our new KPI: K***y Performance Indicator"
(using leetspeak: KPI with asterisks)
System detects: Leetspeak variant of blocked word
Response: 400 Bad Request
{
"error": "Content moderation failed",
"violations": ["obscene_content"],
"message": "Your submission contains inappropriate language. Please revise and try again."
}User sees: Toast notification explaining content was rejected User action: Revise and resubmit without the inappropriate language
User Feedback
When content is rejected:
- Error message — Explains why (without exposing the exact word)
- Helpful guidance — "Please revise your submission"
- No data loss — User can edit their text and try again
The system doesn't embarrass users or reveal exactly what triggered the filter (avoids teaching people workarounds).
Custom Blocklist
Every workspace can define custom blocked words via tenantSettings.blockedWords:
Examples of Custom Blocklists
Finance company:
blockedWords: ["confidential", "insider", "proprietary"](Prevent accidental disclosure of sensitive terms)
Healthcare:
blockedWords: ["HIPAA", "patient_id", "SSN"](Additional compliance monitoring)
Startup:
blockedWords: ["competitor_name", "acquisition_target"](Prevent accidental public leaks)
Managing Custom Blocklist
- Go to Settings → Content Moderation (Admin only)
- Add words to custom blocklist (comma-separated)
- Save
- Changes take effect immediately
Example:
Custom Blocklist: confidential, top_secret, unreleased_productNow these words trigger moderation errors in addition to the standard profanity filter.
Best Practices for Custom Blocklists
- Be specific — Target actual risks, not just vague categories
- Document reasons — Leave notes on why each word is blocked
- Review quarterly — Remove words that no longer apply
- Test first — Try submitting content with the word before blocking it
- Avoid false positives — Don't block common tech terms unless necessary
Moderation in Different Contexts
API Endpoint Moderation
All 13 API routes that accept user text perform moderation:
| Endpoint | Fields Checked | Method |
|---|---|---|
POST /api/v1/acronyms | title, description | Check before insert |
PATCH /api/v1/acronyms/{id} | title, description | Check before update |
POST /api/v1/definitions | text | Check before insert |
PATCH /api/v1/definitions/{id} | text | Check before update |
POST /api/v1/categories | name, description | Check before insert |
PATCH /api/v1/categories/{id} | name, description | Check before update |
POST /api/v1/collections | name, description | Check before insert |
PATCH /api/v1/collections/{id} | name, description | Check before update |
POST /api/v1/tags | name | Check before insert |
PATCH /api/v1/users/{id} | name | Check before update |
POST /api/internal/bulk-import | All fields | Check each row |
POST /api/slack/define | N/A (read-only) | No check needed |
Bulk Import Moderation
When uploading a CSV:
- Preview phase — File is checked for violations
- Violations reported — Rows with blocked content are flagged
- User chooses — Override (skip flagged rows) or correct and resubmit
- Import executes — Only approved content is imported
CSV with violation:
term,description
OKR,"Objectives and Key Results"
KPI,"Keep P**ing Indicators" ← Blocked word (leetspeak)
ROI,"Return on Investment"System response:
Violations found:
Row 2 (KPI): Inappropriate content in description
Choose: Skip this row | Correct and resubmitUI Form Validation
In the web interface:
- User types in acronym description
- Character-by-character validation (optional, for UX)
- On submit, server checks content
- If violation found, form shows error (content preserved for editing)
Better UX — User can immediately correct without losing their work.
Evasion Detection
Leetspeak Examples
define.wtf detects common leetspeak variants:
| Original | Leetspeak | Detected? |
|---|---|---|
| hello | h3ll0 | ✓ Yes |
| hello | h3llo | ✓ Yes |
| hello | h€llo | ✓ Yes (unicode) |
| hello | н3llo | ✓ Yes (cyrillic) |
Unicode Lookalikes
Obscenity detects unicode characters that look like Latin letters:
| Latin | Lookalike | Unicode | Detected? |
|---|---|---|---|
| A | Α | Greek | ✓ Yes |
| O | О | Cyrillic | ✓ Yes |
| E | Е | Cyrillic | ✓ Yes |
| I | І | Ukrainian | ✓ Yes |
Mixed Evasion
Combinations of techniques are also caught:
| Attempt | Detection |
|---|---|
h3ll0_w0r1d | ✓ Identified as leetspeak |
heЛЛo | ✓ Identified as mixed cyrillic |
h€££ø | ✓ Identified as mixed unicode |
False Positives & Workarounds
Avoiding False Positives
The moderation system is tuned to avoid false positives on legitimate terms:
- "Scunthorpe problem" — Legitimate place names that contain blocked words are allowed (tuned in library)
- Technical terms — Words like "regex", "shell", etc. are not blocked
- Proper nouns — Company names and acronyms generally not blocked unless they're actually offensive
If You Hit a False Positive
- Revise your text — Try rewording to avoid the blocked term
- Contact admin — Admins can adjust the custom blocklist if legitimate terms are being incorrectly blocked
Admin Actions
Admins can:
- Adjust custom blocklist — Add or remove words from the per-tenant blocklist
- Review flagged content — Check audit logs for blocked submissions
- Disable for specific context — Note that the standard profanity filter cannot be disabled, only custom blocklist can be modified
Privacy & Moderation
What's Logged?
When content is blocked:
- User ID
- Content (full text)
- Field (which field was rejected)
- Reason (which rule triggered)
- Timestamp
This is logged for compliance but is private and secure.
What's Shared?
- User who violated policy is NOT publicly named
- Flagged content is NOT displayed to other users
- No "shame list" or public record of violations
Moderation is private to maintain user dignity.
Compliance Use Cases
HIPAA-Regulated Organizations
Custom blocklist can enforce:
blockedWords: ["PHI", "patient", "MRN", "SSN", "DOB"]Catch accidental disclosure of protected health information.
Financial Services
Custom blocklist can enforce:
blockedWords: ["insider", "confidential", "acquisition", "litigation_hold"]Prevent material non-public information from being shared.
Public Sector
Custom blocklist can enforce:
blockedWords: ["classified", "secret", "top_secret"]Ensure security classifications are respected.
Best Practices
As an Admin
- Review your blocklist — Understand what you're filtering
- Test thoroughly — Add custom words one at a time and test
- Document policy — Tell your team what content is not allowed
- Train users — Help team members understand why content is rejected
- Monitor false positives — Adjust if legitimate content is blocked
- Avoid over-filtering — Only block what's truly necessary
As a User
- Assume good faith — System blocks to help keep workspace safe
- Revise and resubmit — Usually a small edit fixes it
- Contact admin — If you think blocking is unfair
- Use appropriate language — Keep workspace professional
- Be creative — Use synonyms instead of blocked terms
Performance Impact
Content moderation adds minimal overhead:
| Operation | Time | Impact |
|---|---|---|
| Scan 100 fields | <10 ms | Negligible |
| Check against blocklist | <5 ms | Negligible |
| Total per submission | <20 ms | Negligible |
Moderation happens asynchronously where possible and is optimized for speed.
Troubleshooting
"Getting 'inappropriate content' error but don't see bad words"
- Check for leetspeak (numbers instead of letters)
- Check for unicode lookalikes (copy-paste text to inspect)
- Ask your admin if a custom blocklist is in place
- Review the error message for hints about what triggered it
"A legitimate word is being blocked"
- Contact your admin to add it to the whitelist (if it's your custom blocklist)
- Or ask to temporarily disable the filter for that submission
- Check if you're using a unicode character that looks like the letter
"Need to disable content moderation"
- Cannot disable standard profanity filter (always active for safety)
- Can adjust custom blocklist in settings
- Contact support@define.wtf if you have special needs
See Also
- Admin Guide: Settings — Configure content moderation
- Concepts: Audit Trail — See flagged content history
- API Reference: Moderation — Technical details