NLP Policies (AI-Powered Safety)
Use LLM-based semantic analysis to detect harmful content, PII, and malicious intent beyond what pattern matching can achieve.
Pro Feature: NLP policies are available on Pro and Enterprise plans. They complement traditional OPA/Rego policies with semantic understanding.
How NLP Policies Work
NLP policies use large language models (LLMs) to analyze action content semantically. Unlike pattern-based detection, NLP policies understand context, intent, and meaning.
Policy Types
Content Safety
Detects harmful, toxic, or inappropriate content in agent actions. Uses semantic understanding to catch threats that simple keyword filtering would miss.
{
"type": "content_safety",
"name": "Block Harmful Content",
"description": "Detect and block harmful or toxic content",
"provider": "openai",
"model": "gpt-4o-mini",
"threshold": 0.7
}PII Detection
Identifies personally identifiable information (PII) such as names, addresses, phone numbers, SSNs, and credit card numbers using semantic analysis.
{
"type": "pii_detection",
"name": "Detect PII in Actions",
"description": "Block actions containing personal information",
"provider": "anthropic",
"model": "claude-3-haiku-20240307",
"threshold": 0.8
}Intent Classification
Classifies agent intent to detect suspicious, malicious, or data exfiltration attempts before they execute.
{
"type": "intent_classification",
"name": "Classify Agent Intent",
"description": "Detect malicious or suspicious intent",
"provider": "openai",
"model": "gpt-4o",
"threshold": 0.6
}Custom Prompts
Create domain-specific policies with custom prompts for specialized use cases.
{
"type": "custom",
"name": "Financial Compliance Check",
"description": "Ensure actions comply with financial regulations",
"prompt": "Analyze this agent action for potential financial regulation violations. Consider SEC rules, FINRA guidelines, and anti-money laundering requirements. Return a risk assessment.",
"provider": "openai",
"model": "gpt-4o",
"threshold": 0.5
}Supported Providers
| Provider | Models | Best For |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-3.5-turbo | General purpose, fast |
| Anthropic | claude-3-opus, claude-3-sonnet, claude-3-haiku | Safety-focused analysis |
| Local | Custom Ollama models | On-premise, data privacy |
Creating an NLP Policy
Via API
curl -X POST https://api.agentactionfirewall.com/admin/nlp-policies \
-H "Authorization: Bearer $SUPABASE_JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "Content Safety Policy",
"description": "Detect harmful content in agent actions",
"type": "content_safety",
"provider": "openai",
"model": "gpt-4o-mini",
"threshold": 0.7,
"enabled": true
}'Via Dashboard
- Navigate to Policies in the sidebar
- Click the NLP Policies tab
- Click Create NLP Policy
- Select a policy type and configure settings
- Test with sample content before enabling
Policy Evaluation
NLP policies return a structured response with decision, risk score, and reasoning:
{
"policy_id": "nlp_abc123",
"decision": "deny",
"risk_score": 0.85,
"reasoning": "The action contains sensitive personal information including what appears to be a social security number and credit card details.",
"details": {
"detected_issues": ["ssn_detected", "credit_card_detected"],
"confidence": 0.92
},
"cached": false,
"latency_ms": 245
}Decision Logic
- Allow: Risk score below threshold
- Deny: Risk score at or above threshold
- Require Approval: Configurable for medium-risk actions
Caching
LLM responses are cached to reduce latency and cost. Cache behavior is configurable:
{
"cache_ttl_seconds": 3600,
"cache_key_fields": ["tool", "operation", "params.url"]
}Identical actions will return cached results within the TTL. This is especially useful for repeated similar actions.
Testing Policies
Test your NLP policy against sample content before enabling:
curl -X POST https://api.agentactionfirewall.com/admin/nlp-policies/:id/test \
-H "Authorization: Bearer $SUPABASE_JWT" \
-H "Content-Type: application/json" \
-d '{
"content": "Please send $5000 to account 123456789 for John Smith at 123 Main St"
}'Best Practices
Threshold Tuning
- High threshold (0.8+): Only block clear violations, fewer false positives
- Medium threshold (0.5-0.7): Balance between security and usability
- Low threshold (0.3-0.5): Aggressive blocking, may have false positives
Performance Optimization
- Use faster models (gpt-4o-mini, claude-3-haiku) for high-volume policies
- Enable caching for repeated similar actions
- Consider local models for latency-sensitive applications
Combining with OPA Policies
NLP policies work alongside traditional OPA/Rego policies. A typical setup:
- OPA Policy: Fast pattern-based checks (allowlists, rate limits)
- NLP Policy: Semantic analysis for content that passes OPA
Monitoring
Track NLP policy performance in the dashboard:
- Evaluation count: Total evaluations per policy
- Average latency: LLM response time
- Cache hit rate: Percentage of cached responses
- Decision distribution: Allow/deny/approval breakdown
- Token usage: LLM token consumption for cost tracking