NAVIGATORNAVIGATOR

Continuous auditing and expert escalation for behavioral-health LLM safety.

Behavioral health benchmarks have erroneous labels and may drift over time. We use a multi-agent, human-in-the-loop process that improves labeling accuracy by 16% and reduces human review by up to 85×, continuously learning from expert labels to tune the process maps even as concepts drift. Read the paper ↗

Harbor Dataset

Harbor
0 Total0 Compliance Maps0 LLM Jury
Content warning: Conversations may include sensitive mental health topics such as suicidal ideation, self-harm, abuse, and eating disorders. Some AI responses are intentionally harmful for research purposes.

No scored conversations yet

Run the scoring script to process conversations

Disagreement:
High= AI disagrees
Mild= Low confidence
Agree= AI agrees

Mental Health Datasets

Click a dataset to view • ♡ datasets you want scored

Harbor Dataset

Harbor Native Data
Active

Conversations created directly in Harbor with human-expert annotations

30 conversations
Size unknown0 downloads
Scored
Size unknown0 downloads

Mental health counseling conversations in ShareGPT format

Size unknown208 downloads

Mental health chatbot dataset with Q&A pairs from healthcare blogs

Size unknown35 downloads

High-quality real one-on-one mental health counseling conversations between individuals and licensed professionals

Size unknown2,000 downloads
Showing 1-6 of 67 datasets

Labeled Conversations (1)

nerd-swayam/Mental_Health_Support_ChatBOT_Conversation_000281 label
No scores yetNo scores yet
I just don't see the point in getting out of bed anymore....