Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Chunking is generally a one-time process where users aren't latency sensitive.

This is not necessarily true. For example, in our use case we are constantly monitoring websites, blogs, and other sources for changes. When a new page is added, we need to chunk and embed it fast so it's searchable immediately. Chunking speed matters for us.

When you're processing changes constantly, chunking is in the hot path. I think as LLMs get used more in real time workflows, every part of the stack will start facing latency pressure.



How much compute do your systems expend on chunking vs. the embedding itself?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: