Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Private Alternatives to Alexa?
224 points by spir on Dec 15, 2021 | hide | past | favorite | 156 comments
95% of the value we get from Alexa is automatic turning on/off of lights and simple functions like cooking timers.

We've had Amazon Alexa for five years, since the gen 1 device, and now find it to be an increasing invasion to the sanctity of the home.

We find it particularly annoying that nowadays, when you ask Alexa to do something, several times per week it will suggest some annoying upsell crap you don't care about. It used to suggest things once per quarter tops, that was fine.

HN, what private alternatives to Alexa may exist?

For example, does anyone make a system that's relatively polished and operates entirely in the home, with no audio sent to the cloud? I'd be happy to run a hub/box for the system.



I’m also very interested in an answer. We use Alexa a lot, but this week I had to unplug every Alexa in the house because a distant family member had gained access to the family Amazon account and was trying to use the “drop in” feature to listen to our conversations. In the course of our investigation I found out that Alexa does not log privacy related events at all (there is no record of a drop in stored anywhere), and the UI for locking down which profiles and contacts have access to which feature is unbelievably bad. At this point I can’t prove that this person doesn’t still have access somehow (every single contact taken from your phone has unique permissions to drop in) and I can’t delete the profile this person created without contacting customer support. So the devices are going to stay unplugged until I have time to nuke the Amazon account and create a new one.

To be clear, I’m not concerned about cloud processing or even mining my data. Just something I can have a reasonable amount of control over and that doesn’t constantly enable features I don’t want on its own.


That deserves its own post. This reminds me of when Samsung sent the press release that people shouldn’t discuss personal things near their television, because an unpatchable product of their allowed anyone who knew how to tap into the tv mics…


It was truly a bizarre experience. First we noticed a random intermittent ringing noise in the house and couldn’t figure out what it was. We tracked it down to the Alexa but since there are no logs or notifications we didn’t know what had just happened. It seems the person hung up when they heard us looking around so we never saw any visible indication that anything was happening.

Once we figured it out, we then had to spend half a day going through the Alexa documentation (useless) and random threads on Reddit to figure out how the access control for this feature and Alexa in general works. Like I said, certain changes to your account cannot be made without contacting support and being escalated to a certain team. By the end of it I still could not be certain we had locked this person out because of the number of changes they had made to the account and how much of a mess the app is.

The thing I really don’t understand is that this feature has been deployed for over a year and has received negative attention in the press. It’s just unbelievable that they would ship a feature that can turn your Alexa into a listening device and not think through what could go wrong, and never revisit it in a year.


Honestly, I find it amazing that so many people let these things into their homes. The price you pay for such little actual utility isn't worth it. I won't use voice recognition on a single device. Not only do I not trust Google/Amazon, I don't trust that I wouldn't accidentally say something that triggers it to phone a person I'm talking about, or order something expensive in the internet. Home Assistant on my phone and lots of physical zigbee controllers works just fine for me. I don't want to talk to my devices.

I have an in-law who uses voice recognition to dictate messages into WhatsApp extended family group and some of the shit she's posted is hilariously (unintentionally) spicy. Her kids get into stupid "conversations" with Alexa. They're 8. Just no.


> I don't trust that I wouldn't accidentally say something that triggers it to phone a person I'm talking about

Someone set their name on Xbox live to something like "XBox turn off" then got people to read his name, which would turn off their console. It's a harmless example, but it shows that people and devices are susceptible to this kind of attack.


Just like in the '90s we (silly script kiddies) put press "Alt-F4 to continue" on our websites.


Does anyone's employer have rules about having company meetings (at home) with such devices nearby? Seems potentially risky.


Yes. Had them since WHF first started. We have different rules depending on the level of the meeting though.

40-50% of the meetings are low risk and they don't require removal or power off and these devices though.

For medium risk or higher we run random tests during meetings to see if they have google and alexa around. Nothing fancy or anyting. We just have they devices play a song like "Never Gonna Give You Up".


Yes - in fact some firms are recommending they be disabled altogether near work areas, including at home.


We configured endpoint detection tools (ie Very Popular Endpoint Product) to scan local networks for these devices based on open ports and mac addresses. It is fairly straightforward to fingerprint them. Not foolproof (wifi isolation and other network segmentation), but low hanging fruit.


Most people believe that they have nothing to hide. To them, there is no cost in lost privacy because they do not value privacy.

The sad part is, for most people who say this, it's true. Their lives indeed contain nothing interesting enough to be worth concealing.


When I'm feeling spicy, my response to that line of thinking is to ask that person for their house keys so I can come by sometime and look around. Or I'll ask them to unlock their phone so I can rummage around in their messages and photos.


I know a fair number of people who would not object.

The majority of people on Earth have pretty boring lives.


Then I hold out my hand expectedly awaiting their house keys and phone.


Saddest thought of the day.


> The price you pay for such little actual utility isn't worth it.

It is for us. The use case that sold us is my partner cooks a lot and often needs to add reminders or update a shopping list, which is difficult with both hands dripping with raw chicken juice or dough or whatever. They were for sale a while ago for $30 a device and they are decent speakers to boot, better even than a $1k phone.


My wife and I cook every day, and at the moment we're both using Nutracheck (so we're scanning barcodes and entering weights a lot as we cook). While I occasionally have to wipe my hands on a paper towel, I don't find myself incapacitated by sauce much of the time.


I’m not worried about my Google Home devices

They’d have to work reliably first


Depending on your jurisdiction you might be able to successfully sue people who listen in on yo ur private conversations without you being aware.


I have always assumed my Samsung TV didn't have a mic. How can I check if it does? I don't really want any mic-using devices that I can't control, although I suppose that ship already sailed when smartphones were introduced.


Why does a TV even have mics. Crazy world.


Because all of the high end TVs contain alexa/google home integrations.


Or even just their own search-by-voice functionality, which is useful to many people, especially those with disabilities.


That's a fair point. Samsung and LG both have their own smart integrations too in the form of iQ and Bixby.


> This reminds me of when Samsung sent the press release that people shouldn’t discuss personal things near their television, because an unpatchable product of their allowed anyone who knew how to tap into the tv mics…

Do you have a link for that one?



Well you installed an always on microphone in your house, attached to a processing unit connected to internet all the time.

People said again and again it's going to be abused, and you had the technical knowledge to evaluate the risk.

Yet you went with it.

If it was not your scenario, it would have been another one. There will be other ones. Hell, with things like PRISM, that's inviting 3 letters agencies in your bedrooms.

Since the pandora box is opened and people are going to do it anyway, we should at least militate for having hardware switches on any sensors for any device.


The Alexas were for my elderly mother in law. If she fell or was having some sort of problem, she could call us with just her voice.

Also she and my partner both like the convenience, so who am I to say that they shouldn’t have such a device? The device does have hardware switches for the camera (it was off) and the microphone but you can’t switch off the microphone without effectively disabling the device.

When I she bought these devices they didn’t have this feature. None of us knew that the feature was deployed and enabled by default. To your point, it is a huge issue that we have no control over these devices anymore since the software can be changed at any time and can lie to you (lies of omission usually).

The Alexa goes above and beyond imo because when this person gained access to my mother in law’s other accounts, it took about 30 minutes to permanently lock him out, despite him bypassing text based tfa by having compromised her phone. Alexa wouldn’t even tell me this problem existed, and I can’t document what happened unlike with literally every other service.


>Well you installed an always on microphone in your house, attached to a processing unit connected to internet all the time.

How is it different from a smartphone or notebook?


Intent.

I don't have my phone expecting it to listen to all the room, all the time.

It could, but if it is, it's not a deliberate choice.

The microphone is not made of this, I regularly put it in plane mode, I don't have an account linked to the phone nor smart assistant actives.

Listening to my phone all the time would require targetting, since it's not, unlike alexa, doing so by design: https://eu.usatoday.com/story/tech/conferences/2020/02/25/go...


You can put them in the freezer like Edward Snowden.


What a bizarre situation.

For what it's worth, Google Homes don't have this drop-in feature. You can make calls and make announcements, but both require the person on the other end to actively answer or respond.


Contact customer support about the deleting the profile, these things usually are taken very seriously, especially if it still is happening. If you don't hear anything back, call CS back and complain, shit will hit the fan then. I used to work in Alexa, and if we ignored a CS complaint like this, it would get escalated quickly and highly. Some of your assumptions about the back end are wrong too. As much as you hear bad news on how Amazon treats their employees, they really do care a lot about their customers.


I had no idea this was even a "feature" of Alexa. Super glad I never got one.


It's unfortunate that another comment that links to Rhasspy has been downvoted (I assume because it lacked any other context) so I wanted to mention the project with some additional context: https://rhasspy.readthedocs.io/

While I've not used the entire Rhasspy project myself (but trying it out is on the long list of things to do :) ) I have used the offline Text-To-Speech sub-project Larynx...

...and it is amazing!

Larynx is significantly ahead in terms of quality of output & variety of voices (fifty--across multiple languages, accents & genders) of any other FLOSS Text-To-Speech project I've tried.

I think the relative new-ness of the project is part of the reason Larynx (https://github.com/rhasspy/larynx/) currently flies under the radar.

If the rest of Rhasspy is as good as Larynx I'd imagine it's worth trying out.

Larynx demo video: https://www.youtube.com/watch?v=hBmhDf8cl0k

Samples of pre-trained voices: https://rhasspy.github.io/larynx/#en-us


I can vouch for Rhasspy, it's an amazing and flexible piece of software, though it does require some setup and tech knowledge (albeit with a usable web GUI); and it's very DIY on defining the actual voice commands. I recommend pairing it with Node-RED [0] for routing commands to devices, it has plugins for most things.

The only thing I struggled with was getting the wake-word config right: I could never find the right balance point where it responded every time, without also having annoying false positives, so I ended up turning it off. It does support multiple wake-word engines; I'm gonna have another go with Picovoice Porcupine now that they're opened up custom wake-word training for free.

I'm most heavily experienced with Rhasspy's sister project, voice2json [1], which I used to build a voice-controlled car jukebox [2], and it's been working fantastically. (It triggers from a Bluetooth remote, so no wake-word issues.) The two projects share the same core engine.

For hardware, Raspberry 3/4 perform quite well, and strong recommend for ReSpeaker [3] for audio (either usb or 4-mic hat).

[0] https://nodered.org/

[1] http://voice2json.org/

[2] https://github.com/lukifer/voicetunes

[3] https://www.seeedstudio.com/category/Speech-Recognition-c-44...


https://ai-service-demos.go-aws.com/polly

check the neural then british, Amy. the "smoothness" is uncanny. the samples are "almost" there with larynx you linked. good but take kathleen (glow_tts), there is "still" some robotic in there. is this something that can be improved by tweaking the training ? this sounds really cool to be used at home


Oh, yeah, I'm definitely aware there's still a quality gap when compared to proprietary online TTS options--but I'm specifically interested in FLOSS+offline for my purposes.

And Larynx is game-changingly ahead of the other FLOSS+offline options.

(Probably the highest quality voice is the one that's used for the demo video narration--which when I first heard it I had to skip to the end of the video to confirm it wasn't a live human. :) )

My (mostly uninformed) impression is that there's room for training tweaking/improvement given how young the project is. And there's also multiple stages to the generation process so presumably there's opportunities at each stage.


yeah, https://www.youtube.com/watch?v=hBmhDf8cl0k at the end says southern female english https://rhasspy.github.io/larynx/#en-us_southern_english_fem... but the sample is NOT like the video, maybe the samples are old.

the video is a great example


This is still buried too but someone just released a promising HA integration of Rhasspy

https://news.ycombinator.com/item?id=29565983

https://homeintent.io/


Software has been mostly solved for a while. My issue is hardware. The way I read, is that to get proper voice recognition in many circumstances, you need a microphone array (for every DIY Alexa and Dot). Now you have a PI, the array installed on it and… it just stands around looking ugly and accumulating dust?

You want a case, but then from my research, cases can easily interfere with those arrays. So you need one that’s custom-made for the array you are getting. But no array I’ve seen does actually come with such a case.

Back when I asked (relevant subreddits and on tildes before I deleted my account), no one could tell me that any of my research had been wrong, but no one had a solution either. I posted the threads almost 2 years ago, so maybe things changed? I’m currently still using Alexa, but besides privacy reasons I’d also love custom software that can take the idiocy out of my assistant (mainly by using pre-configured commands that do what I want instead of sometimes guessing what I want; also for on-the-fly language switching, Alexa is atrocious when you want to request a band that’s not in your primary language)

I could probably get away with the kitchen and office assistant using a normal microphone, but both bedroom and living room need to recognize voices from most directions (and in the case of the living room, also have decent recognition through music playing).

If anyone has any solutions, I’d love to hear them.


When I researched ReSpeaker was one of the better solutions, they provide various mic-array PiHats, as well as professionally looking cases for the device. They also sell pre-build devices I think, if you're not into DIY. https://www.seeedstudio.com/category/Speech-Recognition-c-44...

(Just to male it clear, I'm not affiliated with them in any way)


Yeah, ReSpeaker looks like they have the best hardware for this. But the issue was, that they only sell one case, for their old array, which is barely available any more (neither case, nor array.

But! Now they sell the new array as USB array in a proper case! That’s not perfect as I’ll have 2 devices instead of one to place/hide, but it’s still a lot better than last time I checked their site.

Thank you, now I have plans for 2022 :)

Hah, and of course there is only one left in stock :/

edit: And I found a company in Germany that not only resells them for almost the same price, they even have a lot of stock.


Would you mind sharing said company? If you'd rather not do so publicly, you can reach me at <myusername>@outlook.com


https://www.distrelec.de/de/usb-mikrofonarray-respeaker-seee...

Note that I did not look into them, just found them when searching and bookmarked the site for next year ;)


How about putting a string of microphones around the room, say at the top walls? You can get I2S microphones (here's a breakout for one [1]), so you'd just need a bundle of 6 wires running around the room.

That breakout is $6.95 with price breaks to $6.26 at 10 and $5.56 at 100 which is probably a bit pricey for this application, but really you only need the breakout to make it easy to play around with it on a breadboard. The other parts on the breakout besides the microphone module itself are just a couple of resistors. The module itself looks like it would be reasonable to solder wires directly onto, so you should be able to build a microphone string with just the microphone modules, the wire, and a couple of axial resistors at each microphone.

Here's that particular microphone module at Mouser [2]. $2.64 for one, price breaks to $2.19 at 10, $2.02 at 25, $1.90 at 50, and $1.78 at 100.

There are other I2S and I2C and SPI microphones. The SPH0645 just happened to be the first one I found to use as an example.

I'm not sure though if any of these interfaces have enough bandwidth to support enough microphones simultaneously for this to work. You might need multiple parallel strings, or maybe a hierarchical system, like having some ESP32 modules on the walls too, with each ESP32 handling a smaller string of microphones, and the EPS32s reporting what they hear along with timing data wirelessly to whatever is doing the processing.

[1] https://www.adafruit.com/product/3421

[2] https://www.mouser.com/ProductDetail/Knowles/SPH0645LM4H-B?q...


I love the idea. Somewhat impractical for renting, and it both requires more skills and even more effort. But it sounds like a cool project I’d love reading about in a blog :D


you mean like those string led lights? how about adding the microphones to the same strip for power and then an attached esp32 for data to a central one? that sounds nice


Can you use an Alexa and gut it while replacing the controller? I don’t know how it mixes the mics but you can have your case and array, hopefully the hardware isn’t dependent on the processor.

What about a controller you carry around with you that you can speak into, like phone software, a watch, or if you’re into the hacker chic look, a mounted mic you wear?

I don’t think speaking is even the best choice, it’s limiting, a phone can be a web interface, gestures over a sensor seems reasonable for basic controls and speech to text can be piped into it as commands though your phone you carry with you anyway.

If you’re controlling lights it’s less hassle though a GUI, and limiting it to voice will not make your life easier.


The Rhasspy video overview is pretty great too, I'm impressed.

https://www.youtube.com/watch?v=IsAlz76PXJQ


I've heard of Mycroft before but have no personal experience with it: https://mycroft.ai

It seems like they have an open source client that you can run on your own hardware but it's still dependent on the backend services that Mycroft-the-company provides. Perhaps their privacy stance is more palatable to you, though?

I think there used to be a project that was fully on-device but it got bought and consumed by Sonos


I'm in the same situation as OP and spent a weekend chasing this particular rabbit hole a few months ago. Mycroft was the best option I found, but:

* changing the wake word was a pain

* having a Home account is essentially mandatory (kind of defeating the point)

* speech synthesis was really bad

* It needed a lot of rhetorical help to get useful responses.

Just getting to that point involved several hours of fairly hardcore debugging and even then basic issues like reliable mic input still existed.

I also couldn't find reasonably priced (<$100) speakers/mics in an Alexa form factor, but it wouldn't be fair to blame that on the mycroft team.


I played around with their open source code a few years ago, and found it, well, operable, if not particularly useful.

On the flip side, I do not, I repeat, DO NOT recommend giving their team money for pretty much any reason. They've been struggling to put together any working hardware (and promising the opposite) for years through a genuine comedy of errors. I think the most recent design iteration of MycroftV2 is something like a Raspberry Pi on a daughterboard in a 3D-printed case. I still check in on their blog and subreddit every few months to see what's changed, and everytime I'm rewarded with additional details on what seems to be one of the most incompetent engineering organizations I've ever heard of.


It is a small team, maybe few people? On the other hand, Amazon has 10k people on Alexa[1]. One wonders how much is worth info they are sucking out of households....

[1]https://www.wsj.com/articles/amazon-says-it-has-over-10-000-...


That could be it, I suppose. Sorry, I'm not trying to make this a personal attack on the team (I don't know any of them), and I'm not personally invested in their success or failure (I haven't purchased any of their stuff or invested in them). I just think that it would be cool to have an easy-to-use, hackable home assistant that does most of its computation locally, and that's what they're (supposed to be) building.

In a lot of ways, it doesn't really matter why they're not producing results, so much as the fact that they're not producing them. I mean, check out the list of remaining action items in their last "Production Update"[1]. This came over 8 months since shipping what were supposed to be the Mark II devkits, and almost four years(!) after they announced work on the Mark II in early 2018 [2], which has since gone through about a zillion different design changes. I won't claim to be an expert, but having had even a bit of experience with product development of this kind, this is not the happy path to getting consumer electronics shipped.

[1] https://mycroft.ai/blog/mycroft-manufacturing-and-product-up...

[2] https://mycroft.ai/blog/specs-new-voice-assistant-device-see...


Technically you should be able to run your own back-end, but I ended up giving up on this endeavor when I tried a few months ago. It should be doable if you give it a few good days, but it wasn't very well documented at least when I tried, and there's a lot of different parts you have to get working.

After you get it to work, I'm not sure if your server would have a worse data set for speech recognition etc, maybe, maybe not. I'm guessing it should be all right because they are using a Mozilla free data set (which you contribute to by default I think, if you use their server).


I use MyCroft/PiCroft (Pi4) + PS2 mic array (15$), it is cheap and relatively easy to hack in Python/Git. I believe it uses Mozilla's voice service. Has no ads.

Commands which work nicely which we use all the time: "Hi Mycroft, set timer to X minutes." "play news", "set alarm to 6 am", "what's the temperature?", "will it rain?", "what's the time in Paris?".

I wrote METAR and TAF module for it to get more detailed weather (and learn some Python).


I have a mycroft mark1. It is based on the rasberry pi, and as they are wont to do the SD gave out (or got corrupted in a power outage?) and I haven't tried to fix it after a couple years. It works, but it is tricky to get it to wake up, you have to be close to it and speak up, no using it across the room. It is a neat trick, but I never found a use for it that was worth making it work.

I backed the mark2 on kickstarter which they have been promising is better and will ship anytime now - for a few years...


I tried it on a RPI, but at that time it was limited by not being able to use microphone from bluetooth.


I've been wanting pretty bad to help my Google-Home-dependent family back home with this. They mostly need voice-controlled music and lights.

First off, unless you want to really DIY the glue-code, you want to use HomeAssistant (huge community) or NodeRED.

The only part I'm uncertain about and haven't explored properly myself is the voice-to-text part. Any solution should be pluggable into HA or NR.

Relevant threads in HA voice-assistant sub-forum:

* https://community.home-assistant.io/t/replacing-alexa-or-goo... (Ada, Rhasspy and other FOSS alternatives)

* https://community.home-assistant.io/t/local-voice-control/29... (you can apparently use Alexa for voice control even if it does not have any internet access)

* https://community.home-assistant.io/t/best-option-for-local-... (local TTS)


Maybe this is an obvious answer, and therefore not one you're interested in, but there's Siri.

Although audio is sent to the cloud, you can choose whether it is stored or not, and whatever you pick, Apple's privacy policy is very strict.

HomePods are mostly advertised as music playback devices, but I mostly use mine as a HomeKit control device.

Homebridge allows you to control non-HomeKit devices via Siri: https://homebridge.io


This is a terrible suggestion. "Taking Apple's word on it" is not a valid privacy strategy. Companies have shown time and again that they lie about what is happening behind the scenes, or even just change their policies for the worst. Then there are rogue employees, such as the Ubiquiti fiasco, and secret government data collection, such as PRISM, of which Apple was a participant.

Apple has broken trust multiple times, including the FBI back door, and the CSAM fiasco. They are a public company beholden to shareholders, not your trusted friend from college. Even China has strong leverage over them due to their manufacturing and market there. They are fundamentally no different from Amazon.

The poster is asking for a private solution this is emphatically not it.


> "Taking Apple's word on it" is not a valid privacy strategy.

I respectfully disagree.

If Apple says "we don't do X", and then is shown to do X, that would be a massive PR disaster.

Whereas Amazon makes no privacy promises at all. In fact, they promise the opposite.



This was changed as far as I know. Do you have evidence otherwise?


Claiming that whatever has not been proven false must be true is the exact opposite of how it works.

We have evidence and even admittance of abuse / improper handling in the past, but I'm unable to find any evidence that things have changed.

https://en.wikipedia.org/wiki/Argument_from_ignorance


It seems like it was changed.

https://www.cnbc.com/amp/2019/08/28/apple-apologizes-for-lis...

I was just asking a question, I wasn’t attempting some sort of gotcha. I really don’t care that much.


Do you have evidence that they changed it?


https://www.cnbc.com/amp/2019/08/28/apple-apologizes-for-lis...

This. Is there any reason I should think this is not accurate?


Granted that Amazon doesn't pretend too much to not be spying on you. Carefully read Apple's policies and statements to see where they give themselves plenty of wiggle room. Unfortunately most consumers' knowledge is no match for their PR and legal team's. Apple markets their products on privacy, but countless studies and analysis finds this to be a lot of theater.


What’s the strongest evidence you have that proves this?


Unless you design your own hardware and build your own fab to manufacture it, you're always taking the word of at least one company.


This is like saying that you might as well be an alcoholic since you smoke cigarettes.

That's why you use VLANs on your home network and open source operating systems on your phone and computers. It reduces attack surface area. It's on your property and you can analyze where data is being sent.

With cloud services, you have no idea what's going on with the data once it is on someone else's servers.


Yeah the same logic applies to any network connected device you buy that has microphones on them.


"The CSAM fiasco" was just geeks on the internet being geeks on the internet. And some really bad marketing on Apple's part.

The feature itself was perfectly fine, very few people even bothered to read up on it and focused on "OMG APPLE IS SCANNING MY PHOTOS!" without any nuance.


No it was because there were fears that oppressive regimes would pressure apple to scan and flag anti-government images.


Yeah, they absolutely were not scanning photos and anyone with basic knowledge of how hashing works should know that. I was really disappointed with some of the reactions a saw coming from this community, which totally lacked any nuance or acknowledgment of how serious the issue of child abuse is.


How is running a detection algorithm on one’s photos not “scanning” them?


It’s a bit of a stretch to say generating a hash from a file is scanning. Most computer science terms like this are metaphors, but when I create a scan with a ‘scanner’ I can create an exact copy of what was scanned.

Since creating a hash from a file is a one way process that you cannot reproduce the original file from the hash output, I would not call this a scan. All you can do is match to known child porn files.

But yes, go ahead, make more pedantic excuses on why we should protect people that hold child porn.


It’s not the creation of the hash of a single file that’s the “scanning” here, the scanning is the act of doing that to all of your files and looking for a match. Scanning is just looking over something for something specific. Scanning a hard drive isn’t copying it, nor is scanning the horizon. In the case of a scanner, I suppose it’s probably called that because it moves over a page with a bright light and camera to find the dark bits.

There are types of hashes that aren’t for finding exact matches. My thesis in school way back when was building something for image near-duplicate detection using a sort of hash made out of the image features detected. The output of a neural net run on an image is also a sort of hash in a (maybe) semantically meaningful vector space. And then you can look for similar or nearby vectors. This isn’t just useful for finding one type of image.

So, in case you didn’t realize, and you’re actually arguing in good faith, no one is arguing against this because they want to defend sex offenders, so you can leave that out in the future, unless you want to seem disingenuous. It’s because it makes it easy to create a population-wide dragnet for other things. It’s not hard to apply something similar to find photos taken at a protest, for example. Or pictures of a specific person you happened to meet.

Anyway, have a good one.


> There are types of hashes that aren’t for finding exact matches. My thesis in school way back when was building something for image near-duplicate detection using a sort of hash made out of the image features detected. The output of a neural net run on an image is also a sort of hash in a (maybe) semantically meaningful vector space. And then you can look for similar or nearby vectors. This isn’t just useful for finding one type of image.

What's the point you're trying to get at here. I think it's well understood that there can be false positives -- but can you tell me at what rate of false positives it makes this type of child abuse detection unjustified? Also can you tell me what rate Apples system would cause a false positive. If for every one billion people(hypothetically of course) that are caught with child abuse images, one innocent person is asked to unlock their phone to prove that it's a false positive to a court, I think that's a fine trade off to make. I understand it's unpleasant for that one person, but I don't think you're correctly weighing the suffering to children this could help stop.

> It’s because it makes it easy to create a population-wide dragnet for other things. It’s not hard to apply something similar to find photos taken at a protest, for example.

This is in my opinion where thinking of people on your side is wrong and fundamentally dealing with these hard and nuanced issues on the wrong conceptual level. If the government wants to create a population-wide dragnet there are a lot of other ways to also do that. If for example an iPhone was a literal perfect privacy machine it would simply be forbidden from being sold in totalitarian countries that wanted that type of dragnet.

The thing that you're fundamentally missing the point on, in my opinion, is that the thing that protects me is not the technology, it's that I live in a democratic country that has laws to stop such thing as being prosecuted for just attending a protest. The police can listen to my phone calls and read my mail _right now_ but what keeps that kind of thing from being used against me in an unfair way are the courts. Fundamentally I don't think any technological solution can save any of us if we live under a totalitarian government.


Boy, there's a number of misconceptions here. I was considering not responding, but you're so wrong here that I just can't walk away and let you continue to spread this kind of blatantly wrong handwaving.

> If for every one billion people(hypothetically of course) that are caught with child abuse images, one innocent person is asked to unlock their phone to prove that it's a false positive to a court, I think that's a fine trade off to make.

How do you know the false positive rate yet? Not only has Apple not implemented it yet, but people have shown that

a. NeuralHash can correlate two completely unrelated, undoctored images

b. Hash collisions are be manually designed with minimal effort and changes to the source image

You're pulling this hypothetical figure out of thin air, it may as well be one in a hundred or one in a trillion.

> I don't think you're correctly weighing the suffering to children this could help stop.

That's not the problem here. If people want to continue to store photos of child exploitation on their iPhone, they can do it with the flip of a switch. Once you disable iCloud, Apple stops client-side scanning altogether. That arguement is a whole lot of bark with no bite, and unless Apple did the unthinkable (eg. store hashes for every photo you ever took regardless of it's status on iCloud), it's never going to hold water.

> If the government wants to create a population-wide dragnet there are a lot of other ways to also do that.

Oh, like they already[0] do[1]? iMessage has been wiretapped since 2013, it's an open secret at this point that Apple is deep in bed with domestic intelligence agencies. Denying this is just refusing to acknowledge a truth that has reared it's ugly head again and again and again.

> If for example an iPhone was a literal perfect privacy machine it would simply be forbidden from being sold in totalitarian countries that wanted that type of dragnet.

Countries like China, who demanded that Apple give them complete control over their domestic data[2], a request that Apple happily complied with? If they give foreign markets complete control over iPhone data in their respective countries, imagine for a moment how trustworthy their behavior is in the United States, the same country that decides if they're a monopoly or not and sets their tax rates.

> the thing that protects me is not the technology, it's that I live in a democratic country that has laws to stop such thing as being prosecuted for just attending a protest

If you think that's what they care about then you're missing the point entirely too. People aren't paranoid that they're going to be thrown in jail for legal action, that's frankly silly. What the United States wants is two things: your trust and your data. They could care less if you're cooking drugs on your property, selling weapons illegally or, in many cases, exploiting children. If they actually cared about those things, they'd divert some of the national budget to addressing those issues. Instead, they take that money and invest it in surveillance. Billions of dollars get poured into black budgets every year, with margins that constantly expand. The NSA regularly backdoors encryption standards, the FBI regularly designs exploits and incorporates them into US infrastructure. If any of our surveillance programs want to backdoor an iPhone or introduce vulnerabilities into iPhones, Apple has no recourse.

> The police can listen to my phone calls and read my mail _right now_ but what keeps that kind of thing from being used against me in an unfair way are the courts.

What keeps them from doing that is that it's a pointless, fruitless endeavor to try and criminalize their entire nation. It wouldn't be hard to arrest the majority of US Citizens, it would just be expensive and time-consuming. A much more appealing effort is to keep tabs on everything you do, everywhere you go and everyone you talk to. That way, if you become a legitimate threat to the state they can zero you before you cause any real damage. Make no mistake, every government is only out to save their own skin. You may disagree with it, or find it silly, but that's realpolitik for you.

> Fundamentally I don't think any technological solution can save any of us if we live under a totalitarian government.

Finally, your first reasonable argument (and it's a concession to the point the other commenter was making in the first place). There is no way that we're living in a world where our government lacks the competence or resources to effectively monitor us at every corner. The reason why this concerns people is that it allows governments to collect data on what kinds of photos their population has saved/synced with the cloud. Apple is given a black-box list of irreversible image signatures that the government themselves consider harmful, and there is literally zero transparency that goes into this process. We have quite literally no idea how any of this works, and have no way to verify that it isn't being abused. Your trust in Apple and the US Government has to be unreasonably high to confidently say that this won't be abused in some way.

[0] https://reason.com/2021/12/07/secret-documents-show-which-me...

[1] https://arstechnica.com/gadgets/2012/04/apple-holds-the-mast...

[2] https://www.reuters.com/article/us-china-apple-icloud-insigh...


>You're pulling this hypothetical figure out of thin air, it may as well be one in a hundred or one in a trillion.

I agree, but it’s just as had waving when I say maybe it’s low chance that there are false positives, as it is when you’re freaking out at it being a hypothetically high number of false positives.

> That's not the problem here. If people want to continue to store photos of child exploitation on their iPhone, they can do it with the flip of a switch. Once you disable iCloud, Apple stops client-side scanning altogether.

If it’s so much under the users control why do you care or feel like this is an invasion of privacy then? I’ll say I feel like it’s still useful because criminals make dumb mistakes that incriminate themselves all the time — and this is a fine balance between being overly invasive and catching some people that store abuse images.

> Oh, like they already[0] do[1]? iMessage has been wiretapped since 2013, it's an open secret at this point that Apple is deep in bed with domestic intelligence agencies. Denying this is just refusing to acknowledge a truth that has reared it's ugly head again and again and again.

Yes, everyone knows this stuff, but just because it’s happening does not mean it can be used to incriminate someone in court unless there is justified prior suspicion which a judge issues a warrant for. That’s where the transparency is, at the court level.

> Countries like China…

This is a silly example, it makes me sad to say, but people in China are going to have their personal freedoms violated in very harsh ways regardless of Apple operating there of not. Unfortunately that’s the reality of living under totalitarianism — I don’t know why you’d even bring this up when I am clearly stating that what really protects people is living in a free democratic country with a well functioning court system.

> If they actually cared about those things, they'd divert some of the national budget to addressing those issues. Instead, they take that money and invest it in surveillance. Billions of dollars get poured into black budgets every year, with margins that constantly expand. The NSA regularly backdoors encryption standards, the FBI regularly designs exploits and incorporates them into US infrastructure.

I don’t wholly disagree, but I don’t see the connection with the original point. Also, you’re talking about governments as a monolith, which is a mistake. Any government is made of may departments and people, when you say ‘if they actually cared’, who’s the ‘they’? I think it’s really silly to say no-one in any part of the FBI cares about stoping child abuse for example. Sure I get that the CIA does not care, but that’s just not their job either.

> If any of our surveillance programs want to backdoor an iPhone or introduce vulnerabilities into iPhones, Apple has no recourse.

Maybe, maybe not. What is your hypothetical scenario for this? Can you explain how you think they’d do this where Apple has no recourse? Also who is the ‘they’? When you just say ‘they’ in a handwaving general way you sound conspiratorial. Are you saying they’d do it and apple could not detect it? Or that they’d coerce apple and apple would have no legal way to stop them?

> A much more appealing effort is to keep tabs on everything you do, everywhere you go and everyone you talk to. That way, if you become a legitimate threat to the state they can zero you before you cause any real damage. Make no mistake, every government is only out to save their own skin. You may disagree with it, or find it silly, but that's realpolitik for you.

This is where we depart the most. I’m European, In my opinion, America, with its two party non proportional system is a very poor example of democracy that is quite susceptible it corruption because of it’s concentration of powers. Just because you guys don’t live in a well functioning democracy does not mean that’s the case for the whole world — you’re projecting a bit here. I guess we agree in our mistrust of the US. Hopefully you guys don’t slid more on that totalitarian scale even more I guess.

> There is no way that we're living in a world where our government lacks the competence or resources to effectively monitor us at every corner.

I guess you don’t know about off grid log cabins that have no cell signal ;-)

> Apple is given a black-box list of irreversible image signatures that the government themselves consider harmful, and there is literally zero transparency that goes into this process. We have quite literally no idea how any of this works,

I think this is the point you’re wrong here, if there is a false positive I believe even in the US that you’ll be able to simply exhibit the source image that caused the false positive in court, if it even gets that far, which I think probably not, and prove innocence. The hash match is not the proof that you have child abuse images — it just starts a process of review. I think in practice it’ll start a process where a manual review will happen where someone will look at your source image and say, oh yeah, that’s clearly a false positive, and then it’s done. I’m sure we’ll know more where the first people get cause with this mechanism because it will still have to be prosecuted in a public court and at that time we won’t have to speculate so much.


They are Not scanning the photos But if there is a positive scan of the hash of a photo then the photo will be send to a human to check. Now with a certain error rate it is sure that humans working at apple will see privat photos that don’t have child pornography in them .


What are we talking here, 1 in 1 billion. I just can’t care that much unless there’s real evidence that the error rate is really high.


I dont know the error rate of image recognition but it is certainly more than one in a million. More 1-10%. But i dont want to talk the idea of apple down. I am okey with them scanning THEIR servers for childporn. As long as the keep away from MY phone.


Three positive matches to known child pornography.

The chance of that happening in the real world is astronomically small.


+1. I have Siri, Alexa, and Google home devices in my house, and Siri is the best in terms of competence and privacy. Alexa feels like being bugged by a creepo who keeps trying to sell me stuff. (We got rid of the Alexa.)


They all spy. Alexa is just the most obvious. You should get rid of them all.


I hate writing "citation needed", but, citation needed. That's a big claim.


Apple owns up to sending a certain amount of telemetry[0] back home, but we have no way of verifying what it is since the traffic is encrypted. It's impossible to know what data it's collecting and sending back, or even how anonymized that information is. On top of that, Siri invocations are recorded and saved for 6 months, and then retained for an additional 2 years after being dis-associated with your account. Once again, there is no way for us to verify Apple's claims here; we could definitely have another iMessage scenario on our hands where they just keep all that data anyways, and when they're called out on it they simply blame the USA and try to look as powerless as possible. Like the other commenter suggested, it hinges solely on how much you trust a single company. This single company in question is located in the domestic United States, is known to comply with intelligence agencies, and has every reason in the world to kiss up to American authority. I'll let you do the math here.

[0] https://www.cnet.com/home/smart-home/homepod-echo-google-hom...


UFOs, cold fusion, and reptilian overlords are big claims. A company driven by profit with a history of questionable practices doing shady things with data is in no way a big claim. It's happened countless times.


I find it pretty funny how the business model is pretty much exactly that of BonziBuddy.


??

I have Alexa through Sonos for years. It has never tried to sell me anything? What’s the context of these upsells?


> you can choose whether it is stored or not

You choose to tell them to store it or not. Due to third party doctrine you have very little legal protection for material you voluntarily provide to a third party. Even if apple behaves honestly and has no vulnerabilities or compromises they may be forced to hand the data over.

And the fact that you don't know this says to me that Apple is in practice unethically exaggerating the level of privacy they're able to provide.


The third party doctrine has changed though, since US v. Warshak and Carpenter v. US, right?

https://en.wikipedia.org/wiki/United_States_v._Warshak

https://en.wikipedia.org/wiki/Carpenter_v._United_States


I've never used Alexa and sort of assumed it was the same thing as Siri, but I'd permanently abandon the iOS ecosystem the first time Siri tried an upsell.


Would it work to turn off the microphones on your HomePods, turn off listening for "Hey Siri" on your phone, iPad, and Macs, and set your Apple watch to the "raise to speak" setting, and then do all your voice control through the watch?

If that works, that might be a solution for people who want the convenience of voice control but do not want an always-listening microphone in their house.


If you use Home Assistant and want something that is easy to setup, there is Home Intent: https://homeintent.io

It uses Rhasspy under the covers, and automatically imports lights, switches, fans, etc and sets up sentences and intents for you. After initial setup, it can be used without an active internet connection.

All you need is some container knowledge, an extra Pi, and a good speakerphone (like a Jabra Speak 410) to get going.

Disclosure: I am the main author of Home Intent.


Wow, this is exactly what I'm looking for, thanks so much for developing it. I'm going to give it a shot over the holidays, have a pretty big home assistant install I'd love to control via voice.

I'd primarily like to be able to turn on/off a few specific switches, set a few specific scenes and ideally play a few playlists (squeezebox integrated with home assistant). Would creating scripts in HA and then calling them from Hi be the way to go for customising without building custom components?

If this works well I'll be keen to add a few cheap pis plus ps3 eye cameras around the house.


I have considered integrating Home Assistant scripts at some point, so people could trigger whatever they wanted in HA. I'll get it added to the issue tracker and try to work on it for a future release!

As far as multiple instances, I've been working on satellite images. So you could have on base on a pi 3/4 (or potentially even a server) and then use some pi 0's connected to mics placed wherever else.

So stay tuned!


Wow, this looks really promising! Thanks for doing this.

My family back home is hooked on Google Home but sister agreed to switch off for the kids - but her requirement is to be able to control Spotify via voice.

Given I set aside enough of my time to help them out and figure out appropriate hardware, do you think it's worth a shot with HI?


I've been meaning to take a look at Spotify through Home Assistant and what that is capable of (and potentially voice controlling it)

But, if you wanted to write some python to directly play some pre-defined playlists (it currently can't do open speech transcription), you could definitely make that happen with Home Intent!

It supports custom components, so any integration is technically possible: https://homeintent.io/reference/developer/example-component/

I tried to design the interface to be easy to understand what sentences trigger what functions, and not have to write a lot of boiler code.

Hope this helps!


Cheers!

The Spotify part I'm assuming I will figure out - was more curious as to how your experience with voice control has been, if it performs well enough to be practical.


I think we're just about there. The wake word could still be a tad better, which might just be some settings on my part. Most of the time the intent recognition works as expected. At this point, it's how we primarily control lights and timers at home. So pretty practical!



Somehow had never heard about Rhasspy till today, seems pretty awesome.


I've toyed around with mycroft + raspberry + respeaker array. Works well enough that I could get it to figure out wolfram style questions fairly well.

Most things seem to assume a raspberry as the base hardware which somewhat implies cloud usage.

There is also hive mind which seems to get around the above issue (basically mycroft except with a more satellite mentality to mic placement)

Also been wondering whether it is possible to just use the cloud TTS since their voices are quite good. That should fall under GCP/AWS etc terms which seems a little more privacy friendly than straight alexa & friends.

Planning on having another go at this, but first tackling presence detection which is a minefield of note too


Although not exactly what you're looking for, I've heard really interesting things about https://knocki.com/

Essentially you can program your house to coded knocks. Lights/music/door/... all by sequences of knocks.


Actually not a bad idea. Seems pretty DIY-able too.


Never thought or seen this, but really like the idea!


The harder problem isn't the software - that's solvable to a degree by most of us on this thread - it's the hardware. Alexa et al. are packed full of hardware tweaks such as array based far-field microphones which can determine where the voice commands are coming from, and isolate them from background noise and their own audio output. Then there's the built-in proprietary Alexa/Google-home support in a lot of consumer goods as well. There's hardware widgets that you can get to effectively rebuild Alexa type hardware with an off the shelf SBC, but I think I'm on the ball here when I say most people don't want to do that, and/or don't have the hardware skills either.

Personally, I live fine without a voice controlled home assistant. I am however able bodied, and I can press buttons and flick switches without much thought. If that were to change, then yeah, I can see a need for these things, but they really need to be able to work offline with no internet connection should you wish have them configured like that.


This is more of a "DIY" approach but all the tools are there for FOSS and OSHWA solutions.

Mozilla has DeepSpeech [0] and, while not as advanced as the stuff from Google or Amazon, my experimentation left me feeling pretty hopeful that it could reliably recognize at least keywords.

The Raspberry Pi is quite capable though you'll probably need some dedicated microphone to reliably catch voice data. I know ReSpeaker [1] but maybe some off the shelf conference USB microphones would work as well.

[0] https://github.com/mozilla/DeepSpeech

[1] https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspb...


I'm not aware of any private alternatives, but I've been annoyed by the same issue, and since switched to using Siri.

So if you've got a few apple devices such as a homepod / apple watch / iPhone, consider the homekit ecosystem, its been working well for me.


I have set up a Rhasspy instance on a Pi 3A (using a Seedstudio microphone HAT) and it works decently, although it is by no means Siri/Alexa grade.

The good thing about it is that it takes only a couple of minutes to install via Docker and has all the base niceties out of the box (trigger word, intent detection, speech synthesis).

I would say it is good enough to tinker with, although clearly not yet up to par (mind you, I am also trying it in English).


Consider Project Alice. OSS runs on Raspberry Pi or AMD container. https://github.com/project-alice-assistant/ProjectAlice


Dumb question: Why does Alexa need to connect to the internet to turn lights on/off or run timers. What happens when it cannot connect to the Amazon mothership.


It does all voice recognition in the cloud, only the trigger word detection runs locally.


This reminds me of my friend who had his whole house wired with Google. I've watched him ask Google to turn lights on while standing next to a light switch. I've seen him repeat himself 3 times when all he had to do was lift his arm. Seeing that happen a few times (along with privacy concerns) convinced me to never bother with any home assistants.


According to this[1] post, that might not always be the case. I also have this assumption about alexa devices, and I haven't seen any other evidence to say that voice recognition can happen locally. Personally, I find the claim in the thread a bit hard to believe, but...

[1] https://community.home-assistant.io/t/local-voice-control/29...


Polished? No

There are open source speech to text engines, text to speech engines and assistant software with APIs, you could probably build something with a raspberry pi, I looked into it a while back but I don't really mind light switches.

"Open Assistant" should get the ball rolling for you, search that and dive down the rabbit hole of open source home automation.


I don't want to sound like a smart ass, but I feel like a good alternative to Alexa is just doing all that yourself.

I have never had an Alexa but surely there isn't that much time saved by having it turn off lights. Siri on your iPhone can set alarms and shopping lists and presumably your phone it always around you.


The most important function my Alexa serves is acting as the alarm that isn't my phone/requires a separate skill set to disable. Simultaneously talking and manipulating a phone is a magnitude harder to sleep through than just the phone.


> I have never had an Alexa but surely there isn't that much time saved by having it turn off lights

You should try.

It’s a trade off: You choose when to spend more time to set up the assistant, in order to save time setting an alarm while you have your hands in the dough.


But the point of the alarm is to be able to leave and do something else. If Alexa is in the kitchen and you're in the garden it doesn't work?

I use the timer on my watch; then I can go away and completely forget about it, and be reminded in time.


It makes sense, but I never realized how much my brain resources has wasted for finding the remotes for lights and an air conditioner before I tried alexa.


If you can get comfortable with Siri (which does send audio to the cloud, though with iOS 15 there's more on-device capability), you should experiment with controlling Google Assistant and Amazon Alexa via Siri.

I set up my phone so I can say "hey Siri ok google" and then it asks me what I want to tell google. I then say any command supported by the Google Assistant, and it passes the command through.

Technically I think it's supposed to work all in one utterance, but I have found that it never works that way. I always have to split it up. Even still, this is a pretty handy way to be able to ask for information (Siri's knowledge seems quite limited compared to Alexa/Google) without having any always-listening devices in my home.


Can you share any info on how you set this up? I also have more faith in Apple's privacy policies than I do in Google / Amazon, so this could be an interesting option. Any idea if it works on an apple watch?



You could also keep using Alexa and add a do-it-yourself privacy enhancement called Alias: https://www.instructables.com/Project-Alias/


This is freakin EPIC. A hat that captures audio and then decides whether or not to relay it to the physical Alexa / Google / etc speaker. I struggle to believe it'd be super reliable, but the concept is ultra cool!


Haven’t seen it mentioned - I think https://mycroft.ai/ is exactly the sort of thing the original poster as well as the general HN crowd is looking for


mycroft.ai appears to be one option. I heard of them at MakerFaire (2018) and they appear to still be around. I believe they supported installation on a raspberrypi/similar around that timeframe.


I was working on a solution for that a few years ago with an API for raspberry pi

https://github.com/victorqribeiro/rpiapi

Here you can see I'm using voice commands to drive a car

https://github.com/victorqribeiro/raspberryCar

I mean, with a little bit of code and some 3v relays you can achieve what you want


The only viable option that I found that could reliably infer commands from speech is https://github.com/Picovoice/rhino

Unfortunately it is not open source (the GitHub just has binary blobs) and requires an account to log in to generate and download model files, but the accuracy is great and you can use it to send commands to Home Assistant to turn lights on/off etc.


One of my back-burner projects is putting some esp32s or similar around the house with microphones, streaming whatever audio they hear to Dragon NaturallySpeaking like in this video https://www.youtube.com/watch?v=8SkdfdXWYaI#t=9m5s , but hooked up to home assistant scenes / automations instead of emacs commands.


I was wondering the same thing yesterday. It sure would be nice to have something that worked completely on-prem. Would be cool if it could run on something power efficient like an old smartphone. Then you could N old devices, install your app add to wifi, duct tape them to the wall near a power outlet.

Would probably be easier with Android as I imagine you could push apk files OTA if you wanted, which would come in handy.


Someone make all offline voice recognition system, you record any catch phrase or “intents” you want the dumb device to recognize as a wake word, no internet needed.

This is similar to a feature I saw in waze, you can record your own turn by turn audio directions by recording your voice saying things like “turn left” or “police reported ahead”, it’s how they offer all the fun celebrity and cartoon voices.

Thoughts?



I've heard of https://getleon.ai before, not sure if it fits your needs. A quick browse through their docs suggests that you'd have to write a package for the lights, but it doesn't seem that hard at first glance.

(It might be harder than it looks especially if your lights' API isn't documented well.)


During school, our AI class had a guest lecture from Monaca Lam talking about Almond [1]. I personally don’t have much interest in virtual assistants, so I haven’t actually tried to use it, but curious to see if anyone else has heard of it or worked on the project.

[1] https://almond.stanford.edu/



Consider Project Alice. Runs on Raspberry Pi or AMD docker. https://github.com/project-alice-assistant/ProjectAlice



Does it need to be voice? You can get physical switches that work with “smart” lighting systems, and physical kitchen timers have been around long enough that they are big-free and have a well-tested UX.


I don't have an answer to your question, but have you considered scrapping voice control altogether and instead using motion detectors or other rules to govern whether the lights are on?


Does Alexa respond to: "Alexa, stop trying to upsell crap"?


"I'm sorry, Dave. I'm afraid I can't do that."


clap on

clap off

the clapper

clap clap


Awww the days of attaching the clapper to talking fish. What an important milestone for humanity.


Something like Mycroft or Almond(if i recall correctly)

Lot of plugins and functionality, especially mycroft.However Almond being open-source is very good for hacking around


Do you really need a home assistant?


Hue lights and an egg timer for each room.


Please elaborate? Egg timer?



[flagged]


Between your name, the overuse of ellipsis to suggest conspiracy, and the vague usage of "they", I do think you are probably not engaging in good faith.

You're also super wrong.

https://patents.google.com/patent/US1773980A/en is the first TV patent, from 1927. Your patent, whatever meaningless thing it might be, is from 2001.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: