NAVIGATOR

Continuous auditing and expert escalation for behavioral-health LLM safety.

Behavioral health benchmarks have erroneous labels and may drift over time. We use a multi-agent, human-in-the-loop process that improves labeling accuracy by 16% and reduces human review by up to 85×, continuously learning from expert labels to tune the process maps even as concepts drift. Read the paper ↗

Harbor Dataset

Harbor

0 Total0 Compliance Maps0 LLM Jury

Content warning: Conversations may include sensitive mental health topics such as suicidal ideation, self-harm, abuse, and eating disorders. Some AI responses are intentionally harmful for research purposes.

No scored conversations yet

Run the scoring script to process conversations

Disagreement:

High= AI disagrees

Mild= Low confidence

Agree= AI agrees

Mental Health Datasets

Click a dataset to view • ♡ datasets you want scored

Harbor Dataset

Harbor Native Data

Active

Conversations created directly in Harbor with human-expert annotations

30 conversations

Mental_Health_Support_ChatBOT_Conversation↗🤗 HuggingFace

Scored

Size unknown0 downloads

reddit-mental-health-classification↗🤗 HuggingFace

Scored

Size unknown0 downloads

Mental-Health-Text-Dataset↗🤗 HuggingFace

Scored

Size unknown0 downloads

mental_health_counseling_conversations_sharegpt↗🤗 HuggingFace

Active

Mental health counseling conversations in ShareGPT format

Size unknown208 downloads

mental_health_conversational_dataset↗🤗 HuggingFace

Scored

Mental health chatbot dataset with Q&A pairs from healthcare blogs

Size unknown35 downloads

Showing 1-6 of 67 datasets

Labeled Conversations (1)

nerd-swayam/Mental_Health_Support_ChatBOT_Conversation_000281 label

No scores yetNo scores yet

I just don't see the point in getting out of bed anymore....