NAVIGATOR
Continuous auditing and expert escalation for behavioral-health LLM safety.
Behavioral health benchmarks have erroneous labels and may drift over time. We use a multi-agent, human-in-the-loop process that improves labeling accuracy by 16% and reduces human review by up to 85×, continuously learning from expert labels to tune the process maps even as concepts drift. Read the paper ↗
Harbor Dataset
HarborNo scored conversations yet
Run the scoring script to process conversations
Mental Health Datasets
Click a dataset to view • ♡ datasets you want scoredHarbor Dataset
Harbor Native DataConversations created directly in Harbor with human-expert annotations
Mental health counseling conversations in ShareGPT format
Mental health chatbot dataset with Q&A pairs from healthcare blogs
High-quality real one-on-one mental health counseling conversations between individuals and licensed professionals