Healthcare Wayfinding Assistant

A multimodal assistant that reads hospital registration slips, understands spoken queries, and provides natural-language wayfinding—built with OCR, YOLO + OpenCV, RAG, and on-device ASR/TTS.

Project Overview

The Challenge

Hospitals are complex; visitors often arrive with only a printed registration slip and questions like “How do I get to Clinic B?”. Traditional kiosks don’t parse slips, understand speech, or adapt instructions.

The Solution

An end-to-end pipeline that detects and parses slips, extracts clinic/location data with OCR, retrieves route info via RAG over a clinic knowledge base, and converses through on-device ASR/TTS.

Outcomes

95% clinic extraction accuracy on held-out samples.
~40% reduction in average navigation time from kiosk to clinic.
Edge-ready inference path for kiosk deployment (no images required here).

Key Features

Slip Understanding (CV + OCR)

YOLO for slip detection, OpenCV for post-processing, and OCR to extract clinic/department with high accuracy.

Conversational Wayfinding

Whisper ASR + Piper TTS with an LLM (LLaMA-3) to answer questions and guide users step-by-step.

RAG

Chroma vector DB indexes clinic records, maps, and FAQs; retrieval augments the LLM for precise directions.

Edge-Ready Deployment

Dockerized services with lightweight models and offline fallbacks for kiosk or Pi-class devices.

Technology Stack

Python
OpenCV
OCR (Tesseract/Equivalent)
YOLO
LLM (LLaMA-3)
Whisper ASR
Piper TTS
RAG
ChromaDB
FastAPI
Docker

Project Timeline

Phase 1: Data & Pipeline

Curated slip samples; trained slip detector; built OCR + post-processing to normalize clinic names.

Phase 2: Knowledge Base & RAG

Ingested clinic records, maps, and FAQs into Chroma; wired retrieval to augment LLM responses.

Phase 3: Conversational Loop

Integrated Whisper ASR and Piper TTS; built dialog flows for clarifications and stepwise guidance.

Phase 4: Edge Packaging

Dockerized services; reduced model footprints; added offline fallbacks and health checks.