SFR-Guard

Responsible AI isn’t just about regulatory requirements. SFR-Guard assist in aligning technology with your company’s values and mission.

From the Salesforce 360 Blog – https://www.salesforce.com/blog/sfr-guard-ensuring-llm-safety-and-integrity-in-crm-applications/

Securing the Future of AI: Salesforce’s SFR-Guard for Safe, Trusted Generative AI

The Critical Need for AI Safety in the Age of Autonomous Agents

As generative AI becomes deeply embedded in business workflows—from CRM interactions to code generation—ensuring these systems operate safely and ethically is paramount. At Salesforce AI Research, we’re pioneering advanced guardrail technologies that protect users while maintaining AI’s transformative potential.

Understanding the Risks: Why LLM Agents Need Protection

Modern AI agents act as autonomous assistants capable of:

Executing CRM workflows
Generating and modifying code
Processing sensitive customer data
Automating business communications

Three key threat vectors emerge:

Malicious User Intent – Bad actors attempting spam, data leaks, or system breaches
Harmful LLM Outputs – Even benign requests sometimes generate toxic or biased content
Adversarial Environments – Manipulative data inputs that “trick” agents (e.g., comments suggesting malicious code)

Introducing SFR-Guard: Salesforce’s AI Safety Framework

Our SFR-Guard model family provides enterprise-grade protection specialized for CRM workflows, outperforming alternatives:

Model	Parameters	Fine-Grained Detection	Explanations	Severity Levels	Public Benchmark	Private CRM Benchmark
SFR-Guard	0.05B-8B	✅	✅	✅	83.3	93.0
GPT-4o	Unknown	✅	✅	✅	78.7	84.5
LlamaGuard 3	8B	✅	❌	❌	71.3	71.0

Key Innovations

Multi-Layer Defense
- Generation 1: Specialized classifiers (BERT, Flan-T5) for toxicity & prompt injection detection
- Generation 2: Fine-tuned Phi-3-mini LLMs with 128k context windows for holistic analysis
Transparency Features
- Violation highlighting with inline citations
- Natural language explanations of moderation decisions
- Severity scoring (1-5) for appropriate response escalation
CRM-Optimized Training
- Blends public datasets with de-identified Salesforce usage data
- Synthetic generation of edge cases
- Multilingual coverage (EN, FR, DE, ES, IT, JP)

Deep Dive: How SFR-Guard Works

Toxicity Detection Matrix

Category	Examples
Hate Speech	Racial/ethnic slurs
Identity Attacks	Targeted harassment
Violence	Threats or glorification
Physical Harm	Dangerous instructions
Sexual Content	Explicit material
Profanity	Obscene language

Prompt Injection Protection

Attack Type	Defense Strategy
Role-Play/Jailbreaks	DAN attack prevention
Privilege Escalation	Policy enforcement
Prompt Leakage	Sensitive data masking
Adversarial Suffixes	Encoding detection
Privacy Attacks	PII redaction
Malicious Code	Secure code generation

The Future of Trusted AI at Salesforce

Our ongoing research spans:

Cultural appropriateness in AI outputs
Hallucination reduction
Factual consistency in summarization
The xAlign framework for human-AI alignment

Experience safer AI today: SFR-Guard technologies power Salesforce’s Trust Layer, Security Checks, and Guardrails – ensuring your Agentforce deployments remain both powerful and protected.

“In the AI era, trust isn’t a feature—it’s the foundation.”
— Salesforce AI Research

SFR-Guard

SFR-Guard

Securing the Future of AI: Salesforce’s SFR-Guard for Safe, Trusted Generative AI

The Critical Need for AI Safety in the Age of Autonomous Agents

Understanding the Risks: Why LLM Agents Need Protection