Show HN: Voxos.ai – An Open-Source Desktop Voice Assistant

anotherpaulg · on Jan 19, 2024

Tired of copy and pasting ChatGPT responses between your web browser and IDE?

How does Voxos help avoid copying & pasting code into your IDE? I had a look around the code base and don't see any indication that it allows GPT to directly edit your source files. But maybe I am missing it?

I'm asking because this is a major focus of my open source AI coding project aider [0]. I always like to see how other projects approach the challenge of letting GPT edit existing code. Most recently, aider adopted unified diffs as the GPT 4 Turbo code editing format [1].

[0] https://github.com/paul-gauthier/aider

[1] https://aider.chat/docs/unified-diffs.html

mikeravkine · on Jan 20, 2024

I just wanted to say thank you, aider (with the new unified diff format) is the first AI tool that has actually changed the way I work.

anotherpaulg · on Jan 20, 2024

That’s great to hear. I’m glad you’re finding it useful!

unshavedyak · on Jan 20, 2024

I’d love to see something like aider that just sat on a side window and suggested edits, automatically keeping up to date with my changes as I’m editing. Ie I want something to add to my existing workflow, not require my editor or workflow fundamentally change.

cinntaile · on Jan 20, 2024

Is that gif in your github in realtime? Is the speed limited due to the number of tokens per second that an LLM can produce or is this because of the pair programming aspect where you try to simulate a real interaction between 2 "agents"?

Falimonda · on Jan 19, 2024

Hey there. I'd come across aider a few weeks ago - thrilled to have your input on this.

You're correct that Voxos in its current form does not directly work with the user's file system. I'll admit I chose my words carefully in saying that it spares you from copy and pasting between ChatGPT and your IDE - not necessarily that you won't be copy and pasting any more. I feel like having the text response dump to a text editor helps me speed up my workflow considerably when contrasted with the ChatGPT UI being "read-only" in this sense.

Anyways, I'd been messing around with function calling in an earlier version of Voxos and plan on bringing all that work into this beta soon. In terms of my approach, I plan on using docker to host a network mapped drive on the host machine. Then connecting the IDE from the host to the network mapped drive. I'm not sure how well that will carry over to the non-beta version of Voxos that I envision will come with an installer for non-technical users. I haven't put that much work into the idea yet.

An alternative was to host all of it in the cloud and simply offer a web IDE to a container, then make sure there's reliable backup and revert system in place if/when things go south. That's heading more towardsa hosted solution though and I simply don't have time to support paying customers even once Voxos matures to the point I'd liked for a v1.0.

I'll take a closer look at the unified-diffs when I get a chance!

anotheryou · on Jan 19, 2024

> Supports the following LLMs: > OpenAI's Completions Models

so not gpt4.5-turbo? that's the chat API after all.

ianbicking · on Jan 19, 2024

Looking at the code it uses the chat completion API, so I'm guessing the description there is wrong

Falimonda · on Jan 19, 2024

What's wrong about it? It does indeed support using "gpt-4-1106-preview" which based on https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb... appears to be "GPT4 Turbo"

ianbicking · on Jan 20, 2024

I mean the quoted description that could imply it only works with (non-chat) completion APIs is wrong, or at least confusing

anotheryou · on Jan 19, 2024

thanks!

bebrws · on Jan 19, 2024

I made a little proof of concept that used whisper cpp and ChatGPT to take “command” requests and generate Apple scripts which I could then run in OS X

It actually works somewhat well. I think with some more work and thought, something like this, could actually be useful.

Just saw this was for linux and Windows only

Falimonda · on Jan 19, 2024

Will you consider contributing to Voxos? I don't have a Mac on hand, otherwise I would take a look at supporting it myself. I know someone's already opened an issue on the repo about it.

_ktqs · on Jan 19, 2024

I like the idea and support your project. As an enhancement/a suggestion you could let the assistant reply back with the voice of the user's choice like in the ChatGPT app.

Falimonda · on Jan 19, 2024

Thanks for your helpful feedback!

smcleod · on Jan 21, 2024

"Open Source" ... but is built around and tied to "Open"AI... :/

So weird that this isn't local LLM first.

jsheard · on Jan 19, 2024

To be clear, this is an open-source frontend for a not-open-source AI provider. It's calling OpenAI behind the scenes.

Falimonda · on Jan 19, 2024

Fair enough. With that said, support for running a local, self-hosted, and non-OpenAI cloud-hosted models is in the works.

Cheer2171 · on Jan 19, 2024

"Open source" need to be banned from the titles of these kinds of submissions. Or at least you should have to call it an "API client".

The post title as written seems intentionally deceptive.

neilv · on Jan 20, 2024

I think not everyone understands why it's very important that the term mean more than "I put something in GitHub".

Especially when that something is promoting the opposite of open source. (That's "open sores", which is different.)

Falimonda · on Jan 19, 2024

That sounds like a personal preference. The repo is public and welcomes contributors which by definition is open-source. There's nothing deceptive about it, but you're free to get pedantic over it all you'd like.

eurekin · on Jan 19, 2024

Can it call anything self hosted or ollama?

speedgoose · on Jan 19, 2024

It should be possible using LiteLLM and a patch or a proxy.

https://github.com/BerriAI/litellm

Falimonda · on Jan 19, 2024

That's on the roadmap

htsh · on Jan 19, 2024

Can one enter their own opeanai URL and api-key? (so we can use openai-compatible things like openrouter or lm-studio)?

nullstyle · on Jan 19, 2024

Doesn't look like it: https://gitlab.com/literally-useful/voxos/-/blob/dev/voxos/s...

edit: shouldn't be hard to enable though

Falimonda · on Jan 19, 2024

Yes, you can define your own key in either the .env, CLI call on run.sh, or in your environment.

https://gitlab.com/literally-useful/voxos/-/blob/dev/.env?re...

nullstyle · on Jan 19, 2024

That doesn't let me send requests to my local litellm instance, though. You have to be able to configure the endpoint that requests are sent against as well.

Falimonda · on Jan 20, 2024

Nice. LiteLLM was just the thing I've been looking for and hoping to integrate.

nullstyle · on Jan 20, 2024

Hell yeah. Good luck!

Falimonda · on Jan 20, 2024

Do you know if there's anything out there like LiteLLM that includes OpenAI's whisper model? I took a look at the litellm package and it doesn't appear they support the audio module. :/

nullstyle · on Jan 20, 2024

I'm not sure if it is _fully_ openai compatible, but whispercpp has a server bundled that says it is "OAI-like": https://github.com/ggerganov/whisper.cpp/tree/master/example...

I don't have any direct experience with it... I've only played around with whisper locally, using scripts.

khimaros · on Jan 20, 2024

https://github.com/mudler/LocalAI

lxe · on Jan 19, 2024

I think anything compatible with either chat completions or completions API should work.

thejohnconway · on Jan 19, 2024

I don’t mean to be too dismissive, but this would really only be interesting if it ran local voice transcription and a local LLM.

Falimonda · on Jan 19, 2024

Thanks for your feedback. Local and remote self-hosted transcription and LLM integration is on the roadmap.

smcleod · on Jan 19, 2024

Seems to be missing Local LLM / offline support and tied to openAI.

Falimonda · on Jan 19, 2024

Missing relative to what?

emacsen · on Jan 19, 2024

The term "Open Source" when applied to these applications is at best confusing and at worst misleading.

While the license of the project may be FLOSS, if all an application is doing is grabbing locally generated input, sending it off to a proprietary third party black box, and then processing the output, the application is primarily glue and the hard work is being done by the proprietary API.

Thus the application adheres to the license of Free/Open Source Software, but not the spirit or motivation, which in this case is primarily independence from third party access to our private data.

By way of analogy, it would be like advertising an electric car that requires a diesel fuel generator inside to charge the battery. Yes, the car may be electric, but is entirely dependent on a machine that only works via fossil fuels.

I've seen a good half dozen "Open Source" applications which work this way and each time it's a bit of a let down.

Falimonda · on Jan 20, 2024

Where were the claims that this included "hard work", or any requirement that "hard work" be involved?

"if all an application is doing is grabbing locally generated input" demonstrates how little time you took to look over the repo, but feel free to be as offensive as you'd like.

The application was built to be extensible, support local LLMs in the near future - when they're anywhere near as good as the black boxes that are readily available. It's a matter of practicality and it's apparently something lost on many.

lolinder · on Jan 20, 2024

There's a lot of interest in building fully local chat applications, and the people interested in those (myself included) are disproportionately likely to click on a submission advertising itself as an open source assistant.

You're not wrong in your definition, but neither are we—there's a reason why F-Droid provides warning banners on apps that rely on proprietary APIs. A large portion of the free software movement would rather use an inferior free solution than one that is better but proprietary.

That said, don't take the criticisms too much to heart—I think it's great that people are building things at all points on the free software spectrum.

emacsen · on Jan 20, 2024

By hard work, I mean the core computational functionality is being done not only by a black box, but by a black box that is openly hostile to private user generated information- something that a home assistant encourages through knowing what a user's preferences are for music, schedule, and so on. This is a common term of art.

As for the issue of weather it's "lost on many", I can't speak for others, but I use proprietary LLMs for certain jobs- but I also understand the risk that I place on myself and others when I do so. On the other side, Voxos.ai's documentation's first section is entitled "Voxos [Beta Release] - Voice Your Desires"- voicing my desires would be to give them to a third party. It's a bad idea.

And instead of simply acknowledging the issues and saying "Yes, we have plans for local LLMs" or "We're looking at a mix of rule based and LLM based systems in the future" or even saying nothing, your response is a mix of ridicule and derision. It's very telling.

on Jan 19, 2024

[deleted]

airstrike · on Jan 20, 2024

Sorry for the seemingly irrelevant comment but your URL on the top right of the page, under "Project Information", only has two "ww" instead of "www"

Falimonda · on Jan 20, 2024

Hah... thanks for pointing that out!

fuddle · on Jan 19, 2024

A demo would be useful.

iJohnDoe · on Jan 19, 2024

Agreed. Interested in this and would like to see a demo.

Terms like “Recording” make it seem confusing.

Have the responses open in Notepad is confusing in normal work flows.

Falimonda · on Jan 19, 2024

Thanks. I'll take that into consideration.

nickthegreek · on Jan 19, 2024

This should be a Show HN:

Falimonda · on Jan 19, 2024

Whoops, thanks for the nudge.

ex3ndr · on Jan 19, 2024

[flagged]

Falimonda · on Jan 19, 2024

What's super weird?