Been running it on a locked down Hetzner server + using Tailscale to interact with it and it's been surprisingly useful even just defaulting to Gemini 3 Flash.
It feels like the general shape of things to come - if agents can code then why can't they make their own harness for the very specific environments they end up in (whether it's a business, or a super personalized agent for a user, etc). How to make it not a security nightmare is probably the biggest open question and why I assume Anthropic/others haven't gone full bore into it.
https://github.com/caesarnine/binsmith
Been running it on a locked down Hetzner server + using Tailscale to interact with it and it's been surprisingly useful even just defaulting to Gemini 3 Flash.
It feels like the general shape of things to come - if agents can code then why can't they make their own harness for the very specific environments they end up in (whether it's a business, or a super personalized agent for a user, etc). How to make it not a security nightmare is probably the biggest open question and why I assume Anthropic/others haven't gone full bore into it.