Benchmark dataset for evaluating clinician-facing chat assistants: physician-authored conversations plus rubric items, use-case and difficulty labels, specialty metadata, and a built-in canary to reduce benchmark contamination. Hosted on Hugging Face under an MIT license.