define.wtf
Concepts

Content Moderation

How define.wtf automatically filters inappropriate content while maintaining team safety

Content Moderation

define.wtf includes automated content moderation to keep your workspace professional and safe. Every user-editable text field is scanned for profanity, offensive language, and leetspeak evasion attempts.

Overview

What Gets Moderated?

Content is automatically scanned on 13 API routes across 21 fields:

Content TypeFields ScannedExamples
AcronymsTitle, Description"OKR", "Objectives and Key Results"
DefinitionsText, NotesDefinition body, admin notes
CategoriesName, Description"Finance", "Sales Strategy"
CollectionsName, Description"Q4 Planning", "Project Alpha"
TagsName"important", "urgent"
User ProfilesName, BioDisplay name, user biography

When Moderation Happens

  • On create — New acronym, definition, category, etc.
  • On edit — When user updates any text field
  • On import — During bulk CSV import
  • Real-time — No delay; submission blocked if content violates policy

What Gets Blocked?

  1. Explicit profanity — Industry-standard blocklist of offensive words
  2. Offensive slurs — Category-specific blocklists (ethnic, gender, orientation, etc.)
  3. Leetspeak variantsh3ll0, @$$hole, f*ck, etc.
  4. Unicode evasion — Lookalike characters (1/, ¡, ɪ) used to bypass filters
  5. Custom blocklist — Per-tenant custom blocked words

Moderation Engine: Obscenity Library

define.wtf uses the obscenity library, a comprehensive Node.js profanity filter:

Why Obscenity?

  • Detects leetspeak and unicode evasion automatically
  • Extendable with custom blocklists
  • Performance-optimized (scans thousands of words in <10ms)
  • Maintains fairness (no false positives on legitimate tech terms)

Standard Blocklists

Obscenity includes these default blocklists:

  • en/core — Common English profanity
  • en/en_US — US-specific offensive language
  • en/en_GB — UK-specific offensive language

Total: ~400 words and variants covered.

Content Moderation Behavior

When Content Violates Policy

User submits content with blocked word:

User submits: "Check out our new KPI: K***y Performance Indicator"
              (using leetspeak: KPI with asterisks)
System detects: Leetspeak variant of blocked word
Response: 400 Bad Request
{
  "error": "Content moderation failed",
  "violations": ["obscene_content"],
  "message": "Your submission contains inappropriate language. Please revise and try again."
}

User sees: Toast notification explaining content was rejected User action: Revise and resubmit without the inappropriate language

User Feedback

When content is rejected:

  1. Error message — Explains why (without exposing the exact word)
  2. Helpful guidance — "Please revise your submission"
  3. No data loss — User can edit their text and try again

The system doesn't embarrass users or reveal exactly what triggered the filter (avoids teaching people workarounds).

Custom Blocklist

Every workspace can define custom blocked words via tenantSettings.blockedWords:

Examples of Custom Blocklists

Finance company:

blockedWords: ["confidential", "insider", "proprietary"]

(Prevent accidental disclosure of sensitive terms)

Healthcare:

blockedWords: ["HIPAA", "patient_id", "SSN"]

(Additional compliance monitoring)

Startup:

blockedWords: ["competitor_name", "acquisition_target"]

(Prevent accidental public leaks)

Managing Custom Blocklist

  1. Go to SettingsContent Moderation (Admin only)
  2. Add words to custom blocklist (comma-separated)
  3. Save
  4. Changes take effect immediately

Example:

Custom Blocklist: confidential, top_secret, unreleased_product

Now these words trigger moderation errors in addition to the standard profanity filter.

Best Practices for Custom Blocklists

  • Be specific — Target actual risks, not just vague categories
  • Document reasons — Leave notes on why each word is blocked
  • Review quarterly — Remove words that no longer apply
  • Test first — Try submitting content with the word before blocking it
  • Avoid false positives — Don't block common tech terms unless necessary

Moderation in Different Contexts

API Endpoint Moderation

All 13 API routes that accept user text perform moderation:

EndpointFields CheckedMethod
POST /api/v1/acronymstitle, descriptionCheck before insert
PATCH /api/v1/acronyms/{id}title, descriptionCheck before update
POST /api/v1/definitionstextCheck before insert
PATCH /api/v1/definitions/{id}textCheck before update
POST /api/v1/categoriesname, descriptionCheck before insert
PATCH /api/v1/categories/{id}name, descriptionCheck before update
POST /api/v1/collectionsname, descriptionCheck before insert
PATCH /api/v1/collections/{id}name, descriptionCheck before update
POST /api/v1/tagsnameCheck before insert
PATCH /api/v1/users/{id}nameCheck before update
POST /api/internal/bulk-importAll fieldsCheck each row
POST /api/slack/defineN/A (read-only)No check needed

Bulk Import Moderation

When uploading a CSV:

  1. Preview phase — File is checked for violations
  2. Violations reported — Rows with blocked content are flagged
  3. User chooses — Override (skip flagged rows) or correct and resubmit
  4. Import executes — Only approved content is imported

CSV with violation:

term,description
OKR,"Objectives and Key Results"
KPI,"Keep P**ing Indicators"  ← Blocked word (leetspeak)
ROI,"Return on Investment"

System response:

Violations found:
  Row 2 (KPI): Inappropriate content in description
Choose: Skip this row | Correct and resubmit

UI Form Validation

In the web interface:

  1. User types in acronym description
  2. Character-by-character validation (optional, for UX)
  3. On submit, server checks content
  4. If violation found, form shows error (content preserved for editing)

Better UX — User can immediately correct without losing their work.

Evasion Detection

Leetspeak Examples

define.wtf detects common leetspeak variants:

OriginalLeetspeakDetected?
helloh3ll0✓ Yes
helloh3llo✓ Yes
helloh€llo✓ Yes (unicode)
helloн3llo✓ Yes (cyrillic)

Unicode Lookalikes

Obscenity detects unicode characters that look like Latin letters:

LatinLookalikeUnicodeDetected?
AΑGreek✓ Yes
OОCyrillic✓ Yes
EЕCyrillic✓ Yes
IІUkrainian✓ Yes

Mixed Evasion

Combinations of techniques are also caught:

AttemptDetection
h3ll0_w0r1d✓ Identified as leetspeak
heЛЛo✓ Identified as mixed cyrillic
h€££ø✓ Identified as mixed unicode

False Positives & Workarounds

Avoiding False Positives

The moderation system is tuned to avoid false positives on legitimate terms:

  • "Scunthorpe problem" — Legitimate place names that contain blocked words are allowed (tuned in library)
  • Technical terms — Words like "regex", "shell", etc. are not blocked
  • Proper nouns — Company names and acronyms generally not blocked unless they're actually offensive

If You Hit a False Positive

  1. Revise your text — Try rewording to avoid the blocked term
  2. Contact admin — Admins can adjust the custom blocklist if legitimate terms are being incorrectly blocked

Admin Actions

Admins can:

  • Adjust custom blocklist — Add or remove words from the per-tenant blocklist
  • Review flagged content — Check audit logs for blocked submissions
  • Disable for specific context — Note that the standard profanity filter cannot be disabled, only custom blocklist can be modified

Privacy & Moderation

What's Logged?

When content is blocked:

  • User ID
  • Content (full text)
  • Field (which field was rejected)
  • Reason (which rule triggered)
  • Timestamp

This is logged for compliance but is private and secure.

What's Shared?

  • User who violated policy is NOT publicly named
  • Flagged content is NOT displayed to other users
  • No "shame list" or public record of violations

Moderation is private to maintain user dignity.

Compliance Use Cases

HIPAA-Regulated Organizations

Custom blocklist can enforce:

blockedWords: ["PHI", "patient", "MRN", "SSN", "DOB"]

Catch accidental disclosure of protected health information.

Financial Services

Custom blocklist can enforce:

blockedWords: ["insider", "confidential", "acquisition", "litigation_hold"]

Prevent material non-public information from being shared.

Public Sector

Custom blocklist can enforce:

blockedWords: ["classified", "secret", "top_secret"]

Ensure security classifications are respected.

Best Practices

As an Admin

  1. Review your blocklist — Understand what you're filtering
  2. Test thoroughly — Add custom words one at a time and test
  3. Document policy — Tell your team what content is not allowed
  4. Train users — Help team members understand why content is rejected
  5. Monitor false positives — Adjust if legitimate content is blocked
  6. Avoid over-filtering — Only block what's truly necessary

As a User

  1. Assume good faith — System blocks to help keep workspace safe
  2. Revise and resubmit — Usually a small edit fixes it
  3. Contact admin — If you think blocking is unfair
  4. Use appropriate language — Keep workspace professional
  5. Be creative — Use synonyms instead of blocked terms

Performance Impact

Content moderation adds minimal overhead:

OperationTimeImpact
Scan 100 fields<10 msNegligible
Check against blocklist<5 msNegligible
Total per submission<20 msNegligible

Moderation happens asynchronously where possible and is optimized for speed.

Troubleshooting

"Getting 'inappropriate content' error but don't see bad words"

  • Check for leetspeak (numbers instead of letters)
  • Check for unicode lookalikes (copy-paste text to inspect)
  • Ask your admin if a custom blocklist is in place
  • Review the error message for hints about what triggered it

"A legitimate word is being blocked"

  • Contact your admin to add it to the whitelist (if it's your custom blocklist)
  • Or ask to temporarily disable the filter for that submission
  • Check if you're using a unicode character that looks like the letter

"Need to disable content moderation"

  • Cannot disable standard profanity filter (always active for safety)
  • Can adjust custom blocklist in settings
  • Contact support@define.wtf if you have special needs

See Also