EUCLID
for oscar

A second set of eyes on every
insurance lead you ingest.

AI-powered insurance lead fraud & anomaly detection. Euclid scores every inbound lead across three risk engines, surfaces the patterns that matter, and routes only what needs a human to your analysts.

Product Overview · Confidential
Prepared for
Oscar Health
Savings delivered $1M+ Saved for customers · to date, across deployed carriers
Monthly volume 500K/mo Lead records processed · and growing
False-positive rate < 11% Industry average · ~30%
Score distribution · last 90 days 2.1M leads scored · all time, cross-carrier
Low
~1.43M · fast-track
Medium
~441K · standard review
High
~168K · analyst queue
Critical
~63K · escalate now
Distribution stays consistent across deployed carriers < 11% false-positive on flagged, vs. ~30% industry
01 / pipeline

How a lead moves through Euclid

01
CSV upload & layout detection
Auto-maps unknown carrier formats via LLM.
02
Validate & enrich
Email + Phone + Address · PII tokenized and decentralized before processing.
03
3-engine risk scoring
Identity · Fraud · Health, run in parallel.
04
Composite score & tier
Single 0-100 Euclid Score · Low / Med / High / Critical.
05
Results, reports & analyst review
Routed queues · audit trail · top-5 reasons.
Live job tracker 4 active · 312 jobs today
  • JOB-7842 carrier_intake_may26.csv layout: auto-detected 47,500 rows Processing · 04:11
  • JOB-7841 broker_book_batch_q2.csv scored: 22,000 / 22,000 22,000 rows Completed · 2m ago
  • JOB-7840 ffm_extract_q2_2026.csv scored: 38,000 / 38,000 38,000 rows Completed · 4m ago
  • JOB-7839 monthly_agent_review.csv queued behind 3 jobs 15,000 rows Pending
Celery async workers · 15-min stuck-job retry All PII tokenized at ingestion
02 / scoring engines

Three engines, one Euclid Score

IR · Identity Risk w = 0.80

Synthetic identity detection

Checks every lead for fake PII, recycled contact details, and identities that show up under different names across the book. When email and phone APIs return results, those signals feed directly into the score. When they don't, heuristics fill the gap.

FS · Fraud Signal w = 0.20

Organized fraud patterns

Looks across the whole batch for patterns that only appear when something organized is happening. The same phone on five different names, an agent submitting 300 policies in a week, a cluster of addresses that don't exist. 35 documented rules, derived from real forensic reviews.

Graph · 35 catalogue rules · 6 pattern families
HR · Health Outlook Risk w = 0.10

Comorbidity scoring

When health data is present, age, BMI, smoker status, and chronic conditions are scored against a weighted comorbidity matrix. This layer is optional. Carriers without health columns skip it entirely, and the composite re-weights automatically.

Clinical · weighted comorbidity matrix
Composite Score Formula
IRC = 0.8 × IR  +  0.2 × FS
Euclid Score = 0.9 × IRC  +  0.1 × HR
Low< 35
Medium35 – 64
High65 – 84
Critical≥ 85
Note on weights Weights flex with evidence strength. Multiple weaker flags balance each other and keep the composite moderate. When fraud signals concentrate heavily on a single agent or lead, FS pushes past its base 0.20 share and dominates the score, overriding identity alone.
Sample lead · score breakdown 5 reasons surfaced · routed to analyst queue
LEAD-04812 · anonymized 78/ 100 High Risk
IR · Identity Risk
82
FS · Fraud Signal
71
HR · Health Outlook
64
Top 5 weighted risk reasons
$0 premium concentration+ FFM ID reused across policies+ Auto-generated email fingerprint+ Cross-state agent licensing mismatch+ Multi-carrier termination footprint+
03 / human + machine

LLM orchestration meets analyst judgement

θ
LLM Orchestration

Theo

Theo handles the parts of the job that don't fit neatly into a rule: reading a carrier CSV format it has never seen before, turning a score into a sentence an analyst can act on, and answering plain-English questions over a scored book. Theo manages prompt routing, PII tokenization, and ties every AI output to a versioned audit record so every flag is traceable.

Layout detection Risk explanations NL Q&A
Σ
Predictive Models

Euclid-trained models

The scoring models were trained on real broker books with known outcomes, including what clean, legitimate books look like, not just fraudulent ones. A dedicated comorbidity layer adds clinical context on top of identity and fraud signals for carriers whose leads include health data.

BoB corpus Partner-carrier data Medical & comorbidity
Manual Review

Analyst queues

Not everything gets auto-decided. High-risk and borderline leads land in a structured queue where a human makes the final call. Euclid shows the five highest-weight reasons for the flag so analysts spend their time investigating, not re-deriving the score. Overrides feed back into future model improvements.

Top-5 reasons Audit trail Override & feedback
Benchmark Intelligence

What powers every Euclid Score

Every Euclid Score can be walked back to a specific rule. The pattern catalogue is derived from three real-world corpora: actual broker-book forensic reviews, carrier-side investigation records, and cross-industry suspension watchlists. Theo processes everything with PII tokenized before any prompt is assembled, and every catalogue version that produced a score is written into the audit trail.

Documented patterns 35 Real forensic-review provenance · not synthetic
Pattern families 6 Policy · broker · agency levels
Agency watchlist 180d Rolling cross-industry suspension window
Catalogue version v1.2.2 Last refresh 2026-05-10
w · IR / FS / HR

Scoring formula

Hybrid composite. Algorithmic logic enriched by AI pattern recognition and external validation APIs.

  • IRC = 0.8·IR + 0.2·FS
  • Score = 0.9·IRC + 0.1·HR
  • LLM enrichment · weighted by pattern confidence
  • Four tiers: Low / Med / High / Critical
0–100 composite
35 · 6 families

Pattern catalogue

Documented fraud signatures derived from real broker-book forensic reviews, not theoretical edge cases.

  • Per-policy signatures (subsidy & eligibility)
  • Per-broker concentration & velocity
  • Per-agency coordination fingerprints
  • Every flag links back to a rule
Quarterly catalogue refresh
Cross-industry

Brokerage oversight layer

Cross-industry signal that elevates risk priors for agents and agencies with recent suspension, termination, or compliance history, independent of any single carrier's view.

  • Agency reputation watchlist
  • Multi-carrier suspension footprint
  • Coaching & compliance history
  • Agent-network linkage signals
Industry-wide signal · agent-level resolution
v1.2.2 · 2026-05-10

Versioning & governance

Catalogue versions are immutable. Every score is stamped with the version that produced it.

  • PII tokenized pre-prompt (names, emails, phones)
  • Score-version audit trail
  • Historical scores preserved on update
  • Published diff with every catalogue release
Audit-grade explainability
The model also knows what a clean book looks like. Legitimate broker books have a spread of premium tiers, varied FPL distributions, multi-carrier diversity, and real engagement signals like autopay and account creation. Teaching the model that boundary is what keeps the false-positive rate where it is.
< 11%False positives
Four fraud categories Euclid detects Surfaced via top-5 reason chips on every flagged lead
Identity fraud

Invalid contact data, suspicious or non-residential addresses, auto-generated email fingerprints, disposable phone numbers.

IR · w=0.80 Primary engine
Inflated households

≥3 lives on majority of policies, ≥6 lives always investigated, FFM application IDs reused across distinct policies.

FS · w=0.20 Signal layer
Premium & subsidy

$0 premium with positive APTC across the book, FPL clustering at subsidy sweet-spots, single-plan-SKU funneling.

FS · w=0.20 Signal layer
Agent red flags

Excessive policy volume per state (250+), cross-state licensing mismatches, future-effective enrollment surges, multi-carrier termination footprint.

FS · w=0.20 + Brokerage oversight
04 / market

Why now

Insurance fraud detection market
$7.2B (2025) $20.2B by 2031 · 19% CAGR
ML deployment among global insurers
62% in 2025 · and rising