Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Voxos.ai – An Open-Source Desktop Voice Assistant (gitlab.com/literally-useful)
123 points by Falimonda on Jan 19, 2024 | hide | past | favorite | 50 comments
Voxos is an open-source desktop voice assistant that aims to put Clippy to shame while supporting new desktop workflows powered by LLMs.

Tired of copy and pasting ChatGPT responses between your web browser and IDE?

Does your copilot not quite do what you need it to do?

I invite you to give Voxos a try and maybe even become a contributor!



Tired of copy and pasting ChatGPT responses between your web browser and IDE?

How does Voxos help avoid copying & pasting code into your IDE? I had a look around the code base and don't see any indication that it allows GPT to directly edit your source files. But maybe I am missing it?

I'm asking because this is a major focus of my open source AI coding project aider [0]. I always like to see how other projects approach the challenge of letting GPT edit existing code. Most recently, aider adopted unified diffs as the GPT 4 Turbo code editing format [1].

[0] https://github.com/paul-gauthier/aider

[1] https://aider.chat/docs/unified-diffs.html


I just wanted to say thank you, aider (with the new unified diff format) is the first AI tool that has actually changed the way I work.


That’s great to hear. I’m glad you’re finding it useful!


I’d love to see something like aider that just sat on a side window and suggested edits, automatically keeping up to date with my changes as I’m editing. Ie I want something to add to my existing workflow, not require my editor or workflow fundamentally change.


Is that gif in your github in realtime? Is the speed limited due to the number of tokens per second that an LLM can produce or is this because of the pair programming aspect where you try to simulate a real interaction between 2 "agents"?


Hey there. I'd come across aider a few weeks ago - thrilled to have your input on this.

You're correct that Voxos in its current form does not directly work with the user's file system. I'll admit I chose my words carefully in saying that it spares you from copy and pasting between ChatGPT and your IDE - not necessarily that you won't be copy and pasting any more. I feel like having the text response dump to a text editor helps me speed up my workflow considerably when contrasted with the ChatGPT UI being "read-only" in this sense.

Anyways, I'd been messing around with function calling in an earlier version of Voxos and plan on bringing all that work into this beta soon. In terms of my approach, I plan on using docker to host a network mapped drive on the host machine. Then connecting the IDE from the host to the network mapped drive. I'm not sure how well that will carry over to the non-beta version of Voxos that I envision will come with an installer for non-technical users. I haven't put that much work into the idea yet.

An alternative was to host all of it in the cloud and simply offer a web IDE to a container, then make sure there's reliable backup and revert system in place if/when things go south. That's heading more towardsa hosted solution though and I simply don't have time to support paying customers even once Voxos matures to the point I'd liked for a v1.0.

I'll take a closer look at the unified-diffs when I get a chance!


> Supports the following LLMs: > OpenAI's Completions Models

so not gpt4.5-turbo? that's the chat API after all.


Looking at the code it uses the chat completion API, so I'm guessing the description there is wrong


What's wrong about it? It does indeed support using "gpt-4-1106-preview" which based on https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb... appears to be "GPT4 Turbo"


I mean the quoted description that could imply it only works with (non-chat) completion APIs is wrong, or at least confusing


thanks!


I made a little proof of concept that used whisper cpp and ChatGPT to take “command” requests and generate Apple scripts which I could then run in OS X

It actually works somewhat well. I think with some more work and thought, something like this, could actually be useful.

Just saw this was for linux and Windows only


Will you consider contributing to Voxos? I don't have a Mac on hand, otherwise I would take a look at supporting it myself. I know someone's already opened an issue on the repo about it.


I like the idea and support your project. As an enhancement/a suggestion you could let the assistant reply back with the voice of the user's choice like in the ChatGPT app.


Thanks for your helpful feedback!


"Open Source" ... but is built around and tied to "Open"AI... :/

So weird that this isn't local LLM first.


To be clear, this is an open-source frontend for a not-open-source AI provider. It's calling OpenAI behind the scenes.


Fair enough. With that said, support for running a local, self-hosted, and non-OpenAI cloud-hosted models is in the works.


"Open source" need to be banned from the titles of these kinds of submissions. Or at least you should have to call it an "API client".

The post title as written seems intentionally deceptive.


I think not everyone understands why it's very important that the term mean more than "I put something in GitHub".

Especially when that something is promoting the opposite of open source. (That's "open sores", which is different.)


That sounds like a personal preference. The repo is public and welcomes contributors which by definition is open-source. There's nothing deceptive about it, but you're free to get pedantic over it all you'd like.


Can it call anything self hosted or ollama?


It should be possible using LiteLLM and a patch or a proxy.

https://github.com/BerriAI/litellm


That's on the roadmap


Can one enter their own opeanai URL and api-key? (so we can use openai-compatible things like openrouter or lm-studio)?


Doesn't look like it: https://gitlab.com/literally-useful/voxos/-/blob/dev/voxos/s...

edit: shouldn't be hard to enable though


Yes, you can define your own key in either the .env, CLI call on run.sh, or in your environment.

https://gitlab.com/literally-useful/voxos/-/blob/dev/.env?re...


That doesn't let me send requests to my local litellm instance, though. You have to be able to configure the endpoint that requests are sent against as well.


Nice. LiteLLM was just the thing I've been looking for and hoping to integrate.


Hell yeah. Good luck!


Do you know if there's anything out there like LiteLLM that includes OpenAI's whisper model? I took a look at the litellm package and it doesn't appear they support the audio module. :/


I'm not sure if it is _fully_ openai compatible, but whispercpp has a server bundled that says it is "OAI-like": https://github.com/ggerganov/whisper.cpp/tree/master/example...

I don't have any direct experience with it... I've only played around with whisper locally, using scripts.



I think anything compatible with either chat completions or completions API should work.


I don’t mean to be too dismissive, but this would really only be interesting if it ran local voice transcription and a local LLM.


Thanks for your feedback. Local and remote self-hosted transcription and LLM integration is on the roadmap.


Seems to be missing Local LLM / offline support and tied to openAI.


Missing relative to what?


The term "Open Source" when applied to these applications is at best confusing and at worst misleading.

While the license of the project may be FLOSS, if all an application is doing is grabbing locally generated input, sending it off to a proprietary third party black box, and then processing the output, the application is primarily glue and the hard work is being done by the proprietary API.

Thus the application adheres to the license of Free/Open Source Software, but not the spirit or motivation, which in this case is primarily independence from third party access to our private data.

By way of analogy, it would be like advertising an electric car that requires a diesel fuel generator inside to charge the battery. Yes, the car may be electric, but is entirely dependent on a machine that only works via fossil fuels.

I've seen a good half dozen "Open Source" applications which work this way and each time it's a bit of a let down.


Where were the claims that this included "hard work", or any requirement that "hard work" be involved?

"if all an application is doing is grabbing locally generated input" demonstrates how little time you took to look over the repo, but feel free to be as offensive as you'd like.

The application was built to be extensible, support local LLMs in the near future - when they're anywhere near as good as the black boxes that are readily available. It's a matter of practicality and it's apparently something lost on many.


There's a lot of interest in building fully local chat applications, and the people interested in those (myself included) are disproportionately likely to click on a submission advertising itself as an open source assistant.

You're not wrong in your definition, but neither are we—there's a reason why F-Droid provides warning banners on apps that rely on proprietary APIs. A large portion of the free software movement would rather use an inferior free solution than one that is better but proprietary.

That said, don't take the criticisms too much to heart—I think it's great that people are building things at all points on the free software spectrum.


By hard work, I mean the core computational functionality is being done not only by a black box, but by a black box that is openly hostile to private user generated information- something that a home assistant encourages through knowing what a user's preferences are for music, schedule, and so on. This is a common term of art.

As for the issue of weather it's "lost on many", I can't speak for others, but I use proprietary LLMs for certain jobs- but I also understand the risk that I place on myself and others when I do so. On the other side, Voxos.ai's documentation's first section is entitled "Voxos [Beta Release] - Voice Your Desires"- voicing my desires would be to give them to a third party. It's a bad idea.

And instead of simply acknowledging the issues and saying "Yes, we have plans for local LLMs" or "We're looking at a mix of rule based and LLM based systems in the future" or even saying nothing, your response is a mix of ridicule and derision. It's very telling.


[deleted]


Sorry for the seemingly irrelevant comment but your URL on the top right of the page, under "Project Information", only has two "ww" instead of "www"


Hah... thanks for pointing that out!


A demo would be useful.


Agreed. Interested in this and would like to see a demo.

Terms like “Recording” make it seem confusing.

Have the responses open in Notepad is confusing in normal work flows.


Thanks. I'll take that into consideration.


This should be a Show HN:


Whoops, thanks for the nudge.


[flagged]


What's super weird?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: