There is nothing there to "try", it's some very basic html displaying some information that doesn't mean anything to me. Looks like a status page, not a platform
Really, it looks like someone who is new to startups / b2b copy, welcome to first contact with users. Time to iterate or pivot
I would focus on design, aesthetics, and copy. Don't put any more effort into building until you have a message that resonates
Basic html? The core of what we built is at the runtime layer. We’re capturing CUDA graphs and restoring model state directly at the GPU execution level rather than just snapshotting containers. That’s what enables fast restores and higher utilization across multiple models.
If that’s not a problem space you care about, that’s totally fair. But for teams juggling many models with uneven traffic, that’s where the economics start to matter.
Also, For what it’s worth, this can be deployed both on-prem and in the cloud. Different teams have different constraints, so we’re trying to stay flexible on that.
Happy to dig deeper and show how exactly it works under the hood. For context, here’s the main site where the architecture and deployment options are explained: https://inferx.net/
I don't personally have this problem. One of my clients does, so my questions are ones I'd expect the CTO to ask you in a sales call. They already have an in-house system and I suspect would not replace it with anything other an open source option or hyperscaler option.
Are you going to make this open source? That's the modus operandi around Ai and gaining adoption for those outside Big Ai (where branding is already strong)
It’s an open-core model. The control plane is already open source and can be deployed fairly easily. We’re not trying to replace in-house systems or hyperscalers. This can run on Kubernetes and integrate into existing infrastructure. The runtime layer is where we’re focusing the differentiation.
It’s an open-core model. The control plane is already open source and can be deployed fairly easily. We’re not trying to replace in-house systems or hyperscalers. This can run on Kubernetes and integrate into existing infrastructure. The runtime layer is where we’re focusing the differentiation.
The demo is live. It’s meant to show how snapshot restore works inside a multi-tenant runtime, not just a prompt playground. You can interact with the deployed models and observe how state is restored and managed across them. The focus is on the runtime behavior rather than a chat UI.
Fair point. I’ll repost as a regular submission instead of Show HN. The goal was to demonstrate the runtime behavior behind multi-model serving rather than a polished end-user app. Appreciate the clarification.