sidmo

2024-11-20 5:00

Commented: "Show HN: Documind – Open-source AI tool to turn documents into structured data"

I'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. I built a simple API over it if you want to try it out: https://github.com/DataFog/vlm-api

2024-11-20 4:59

Commented: "Show HN: Documind – Open-source AI tool to turn documents into structured data"

VLMs are cool - they generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. Here's an open-source API demo I built if you want to try it out: https://github.com/DataFog/vlm-api

2024-11-20 4:27

Commented: "Show HN: Documind – Open-source AI tool to turn documents into structured data"

If you are looking for the latest/greatest in file processing i'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. My company DataFog has an open-source demo if you want to try it out: https://github.com/DataFog/vlm-api

If you're looking for an all-in-one solution, little plug for our new platform that does the above and also allows you to create custom 'patterns' that get picked up via semantic search. Uses open-source models by default, can deploy into your internal network. www.datafog.ai. In beta now and onboarding manually. Shoot me an email if you'd like to learn more!

Hacker News

sidmo

1

2024-11-20

About Me

Recent Activity

Commented: "Show HN: Documind – Open-source AI tool to turn documents into structured data"

Commented: "Show HN: Documind – Open-source AI tool to turn documents into structured data"

Commented: "Show HN: Documind – Open-source AI tool to turn documents into structured data"

HackerNews