How define.wtf automatically filters inappropriate content while maintaining team safety

Content Moderation

define.wtf includes automated content moderation to keep your workspace professional and safe. Every user-editable text field is scanned for profanity, offensive language, and leetspeak evasion attempts.

Overview

What Gets Moderated?

Content is automatically scanned on 13 API routes across 21 fields:

Content Type	Fields Scanned	Examples
Acronyms	Title, Description	"OKR", "Objectives and Key Results"
Definitions	Text, Notes	Definition body, admin notes
Categories	Name, Description	"Finance", "Sales Strategy"
Collections	Name, Description	"Q4 Planning", "Project Alpha"
Tags	Name	"important", "urgent"
User Profiles	Name, Bio	Display name, user biography

When Moderation Happens

On create — New acronym, definition, category, etc.
On edit — When user updates any text field
On import — During bulk CSV import
Real-time — No delay; submission blocked if content violates policy

What Gets Blocked?

Explicit profanity — Industry-standard blocklist of offensive words
Offensive slurs — Category-specific blocklists (ethnic, gender, orientation, etc.)
Leetspeak variants — h3ll0, @$$hole, f*ck, etc.
Unicode evasion — Lookalike characters (1/, ¡, ɪ) used to bypass filters
Custom blocklist — Per-tenant custom blocked words

Moderation Engine: Obscenity Library

define.wtf uses the obscenity library, a comprehensive Node.js profanity filter:

Why Obscenity?

Detects leetspeak and unicode evasion automatically
Extendable with custom blocklists
Performance-optimized (scans thousands of words in <10ms)
Maintains fairness (no false positives on legitimate tech terms)

Standard Blocklists

Obscenity includes these default blocklists:

en/core — Common English profanity
en/en_US — US-specific offensive language
en/en_GB — UK-specific offensive language

Total: ~400 words and variants covered.

Content Moderation Behavior

When Content Violates Policy

User submits content with blocked word:

User submits: "Check out our new KPI: K***y Performance Indicator"
              (using leetspeak: KPI with asterisks)
System detects: Leetspeak variant of blocked word
Response: 400 Bad Request
{
  "error": "Content moderation failed",
  "violations": ["obscene_content"],
  "message": "Your submission contains inappropriate language. Please revise and try again."
}

User sees: Toast notification explaining content was rejected User action: Revise and resubmit without the inappropriate language

User Feedback

When content is rejected:

Error message — Explains why (without exposing the exact word)
Helpful guidance — "Please revise your submission"
No data loss — User can edit their text and try again

The system doesn't embarrass users or reveal exactly what triggered the filter (avoids teaching people workarounds).

Custom Blocklist

Every workspace can define custom blocked words via tenantSettings.blockedWords:

Examples of Custom Blocklists

Finance company:

blockedWords: ["confidential", "insider", "proprietary"]

(Prevent accidental disclosure of sensitive terms)

Healthcare:

blockedWords: ["HIPAA", "patient_id", "SSN"]

(Additional compliance monitoring)

Startup:

blockedWords: ["competitor_name", "acquisition_target"]

(Prevent accidental public leaks)

Managing Custom Blocklist

Go to Settings → Content Moderation (Admin only)
Add words to custom blocklist (comma-separated)
Save
Changes take effect immediately

Example:

Custom Blocklist: confidential, top_secret, unreleased_product

Now these words trigger moderation errors in addition to the standard profanity filter.

Best Practices for Custom Blocklists

Be specific — Target actual risks, not just vague categories
Document reasons — Leave notes on why each word is blocked
Review quarterly — Remove words that no longer apply
Test first — Try submitting content with the word before blocking it
Avoid false positives — Don't block common tech terms unless necessary

Moderation in Different Contexts

API Endpoint Moderation

All 13 API routes that accept user text perform moderation:

Endpoint	Fields Checked	Method
`POST /api/v1/acronyms`	title, description	Check before insert
`PATCH /api/v1/acronyms/{id}`	title, description	Check before update
`POST /api/v1/definitions`	text	Check before insert
`PATCH /api/v1/definitions/{id}`	text	Check before update
`POST /api/v1/categories`	name, description	Check before insert
`PATCH /api/v1/categories/{id}`	name, description	Check before update
`POST /api/v1/collections`	name, description	Check before insert
`PATCH /api/v1/collections/{id}`	name, description	Check before update
`POST /api/v1/tags`	name	Check before insert
`PATCH /api/v1/users/{id}`	name	Check before update
`POST /api/internal/bulk-import`	All fields	Check each row
`POST /api/slack/define`	N/A (read-only)	No check needed

Bulk Import Moderation

When uploading a CSV:

Preview phase — File is checked for violations
Violations reported — Rows with blocked content are flagged
User chooses — Override (skip flagged rows) or correct and resubmit
Import executes — Only approved content is imported

CSV with violation:

term,description
OKR,"Objectives and Key Results"
KPI,"Keep P**ing Indicators"  ← Blocked word (leetspeak)
ROI,"Return on Investment"

System response:

Violations found:
  Row 2 (KPI): Inappropriate content in description
Choose: Skip this row | Correct and resubmit

UI Form Validation

In the web interface:

User types in acronym description
Character-by-character validation (optional, for UX)
On submit, server checks content
If violation found, form shows error (content preserved for editing)

Better UX — User can immediately correct without losing their work.

Evasion Detection

Leetspeak Examples

define.wtf detects common leetspeak variants:

Original	Leetspeak	Detected?
hello	h3ll0	✓ Yes
hello	h3llo	✓ Yes
hello	h€llo	✓ Yes (unicode)
hello	н3llo	✓ Yes (cyrillic)

Unicode Lookalikes

Obscenity detects unicode characters that look like Latin letters:

Latin	Lookalike	Unicode	Detected?
A	Α	Greek	✓ Yes
O	О	Cyrillic	✓ Yes
E	Е	Cyrillic	✓ Yes
I	І	Ukrainian	✓ Yes

Mixed Evasion

Combinations of techniques are also caught:

Attempt	Detection
`h3ll0_w0r1d`	✓ Identified as leetspeak
`heЛЛo`	✓ Identified as mixed cyrillic
`h€££ø`	✓ Identified as mixed unicode

False Positives & Workarounds

Avoiding False Positives

The moderation system is tuned to avoid false positives on legitimate terms:

"Scunthorpe problem" — Legitimate place names that contain blocked words are allowed (tuned in library)
Technical terms — Words like "regex", "shell", etc. are not blocked
Proper nouns — Company names and acronyms generally not blocked unless they're actually offensive

If You Hit a False Positive

Revise your text — Try rewording to avoid the blocked term
Contact admin — Admins can adjust the custom blocklist if legitimate terms are being incorrectly blocked

Admin Actions

Admins can:

Adjust custom blocklist — Add or remove words from the per-tenant blocklist
Review flagged content — Check audit logs for blocked submissions
Disable for specific context — Note that the standard profanity filter cannot be disabled, only custom blocklist can be modified

Privacy & Moderation

What's Logged?

When content is blocked:

User ID
Content (full text)
Field (which field was rejected)
Reason (which rule triggered)
Timestamp

This is logged for compliance but is private and secure.

What's Shared?

User who violated policy is NOT publicly named
Flagged content is NOT displayed to other users
No "shame list" or public record of violations

Moderation is private to maintain user dignity.

Compliance Use Cases

HIPAA-Regulated Organizations

Custom blocklist can enforce:

blockedWords: ["PHI", "patient", "MRN", "SSN", "DOB"]

Catch accidental disclosure of protected health information.

Financial Services

Custom blocklist can enforce:

blockedWords: ["insider", "confidential", "acquisition", "litigation_hold"]

Prevent material non-public information from being shared.

Public Sector

Custom blocklist can enforce:

blockedWords: ["classified", "secret", "top_secret"]

Ensure security classifications are respected.

Best Practices

As an Admin

Review your blocklist — Understand what you're filtering
Test thoroughly — Add custom words one at a time and test
Document policy — Tell your team what content is not allowed
Train users — Help team members understand why content is rejected
Monitor false positives — Adjust if legitimate content is blocked
Avoid over-filtering — Only block what's truly necessary

As a User

Assume good faith — System blocks to help keep workspace safe
Revise and resubmit — Usually a small edit fixes it
Contact admin — If you think blocking is unfair
Use appropriate language — Keep workspace professional
Be creative — Use synonyms instead of blocked terms

Performance Impact

Content moderation adds minimal overhead:

Operation	Time	Impact
Scan 100 fields	<10 ms	Negligible
Check against blocklist	<5 ms	Negligible
Total per submission	<20 ms	Negligible

Moderation happens asynchronously where possible and is optimized for speed.

Troubleshooting

"Getting 'inappropriate content' error but don't see bad words"

Check for leetspeak (numbers instead of letters)
Check for unicode lookalikes (copy-paste text to inspect)
Ask your admin if a custom blocklist is in place
Review the error message for hints about what triggered it

"A legitimate word is being blocked"

Contact your admin to add it to the whitelist (if it's your custom blocklist)
Or ask to temporarily disable the filter for that submission
Check if you're using a unicode character that looks like the letter

"Need to disable content moderation"

Cannot disable standard profanity filter (always active for safety)
Can adjust custom blocklist in settings
Contact support@define.wtf if you have special needs

Content Moderation

On this page