My 5-9 Technical Thoughts

One thing that LLMs — or any technology, really — still haven’t completely solved is processing PDFs.

Tax returns, medical records, simple ACORD insurance forms: they’re all potential gold mines if you can reliably pull the information out of them. My side venture tackled exactly this by processing medical records. You use a mix of prompts with Pydantic to structure the data and capture the fields you care about, then run a RAG approach over several key topics. Mix and match based on preference and business rules and — tah daaaaa — you’ve got a simple, cheap medical chronology SaaS.

The hard part isn’t building. It’s distribution. Good luck solving that one, lmao.

The simple design

Conceptually, here’s how you’d go about building a similar project.

Projects worth checking out: Chandra, PaddleOCR, and Docling. Depending on what you’re processing — images, charts, dense layouts — the quality of your VLM (vision-language model) matters a lot for maintaining the integrity of the pages you read.

If it’s text only? What an easy setup. Extract the content as Markdown so you preserve the flow of information, then feed that straight into an LLM for data enrichment.

From there, depending on the structure of your columns, you can pick and choose the context you want for the embedding. That makes structuring the augmentation and generation layer pretty straightforward.

Have fun building!