Healthcare Wayfinding Assistant
A multimodal assistant that reads hospital registration slips, understands spoken queries, and provides natural-language wayfinding—built with OCR, YOLO + OpenCV, RAG, and on-device ASR/TTS.
Project Overview
The Challenge
Hospitals are complex; visitors often arrive with only a printed registration slip and questions like “How do I get to Clinic B?”. Traditional kiosks don’t parse slips, understand speech, or adapt instructions.
The Solution
An end-to-end pipeline that detects and parses slips, extracts clinic/location data with OCR, retrieves route info via RAG over a clinic knowledge base, and converses through on-device ASR/TTS.
Outcomes
- 95% clinic extraction accuracy on held-out samples.
- ~40% reduction in average navigation time from kiosk to clinic.
- Edge-ready inference path for kiosk deployment (no images required here).
Key Features
Slip Understanding (CV + OCR)
YOLO for slip detection, OpenCV for post-processing, and OCR to extract clinic/department with high accuracy.
Conversational Wayfinding
Whisper ASR + Piper TTS with an LLM (LLaMA-3) to answer questions and guide users step-by-step.
RAG
Chroma vector DB indexes clinic records, maps, and FAQs; retrieval augments the LLM for precise directions.
Edge-Ready Deployment
Dockerized services with lightweight models and offline fallbacks for kiosk or Pi-class devices.
Technology Stack
- Python
- OpenCV
- OCR (Tesseract/Equivalent)
- YOLO
- LLM (LLaMA-3)
- Whisper ASR
- Piper TTS
- RAG
- ChromaDB
- FastAPI
- Docker
Project Timeline
Phase 1: Data & Pipeline
Curated slip samples; trained slip detector; built OCR + post-processing to normalize clinic names.
Phase 2: Knowledge Base & RAG
Ingested clinic records, maps, and FAQs into Chroma; wired retrieval to augment LLM responses.
Phase 3: Conversational Loop
Integrated Whisper ASR and Piper TTS; built dialog flows for clarifications and stepwise guidance.
Phase 4: Edge Packaging
Dockerized services; reduced model footprints; added offline fallbacks and health checks.