NAVIGATOR
Continuous auditing and expert escalation for behavioral-health LLM safety.
Behavioral health benchmarks have erroneous labels and may drift over time. We use a multi-agent, human-in-the-loop process that improves labeling accuracy by 16% and reduces human review by up to 85×, continuously learning from expert labels to tune the process maps even as concepts drift. Read the paper ↗
Harbor Dataset
Harbor0 Total0 Compliance Maps0 LLM Jury
Content warning: Conversations may include sensitive mental health topics such as suicidal ideation, self-harm, abuse, and eating disorders. Some AI responses are intentionally harmful for research purposes.
No scored conversations yet
Run the scoring script to process conversations
Disagreement:
High= AI disagrees
Mild= Low confidence
Agree= AI agrees
Mental Health Datasets
Click a dataset to view • ♡ datasets you want scoredHarbor Dataset
Harbor Native DataConversations created directly in Harbor with human-expert annotations
30 conversations
Mental_Health_Support_ChatBOT_Conversation↗🤗 HuggingFace
Scored
Size unknown0 downloads
reddit-mental-health-classification↗🤗 HuggingFace
Scored
Size unknown0 downloads
Mental-Health-Text-Dataset↗🤗 HuggingFace
Scored
Size unknown0 downloads
mental_health_counseling_conversations_sharegpt↗🤗 HuggingFace
Active
Mental health counseling conversations in ShareGPT format
Size unknown208 downloads
mental_health_conversational_dataset↗🤗 HuggingFace
Scored
Mental health chatbot dataset with Q&A pairs from healthcare blogs
Size unknown35 downloads
Showing 1-6 of 67 datasets
Labeled Conversations (1)
nerd-swayam/Mental_Health_Support_ChatBOT_Conversation_000281 label
No scores yetNo scores yetI just don't see the point in getting out of bed anymore....