What environment are you using that:
- Has access to Youtube
- Can run Python code
- Can’t run JS code
If the concern is security, it sounds like the team went to great lengths to ensure the JS was sandboxed (as long as you’re using Deno).
If you’re using some sort of weird OS or architecture that Deno/Node doesn’t support, you might consider QuickJS, which is written in pure C and should work on anything. (Although it will be a lot slower, I’m not clear just how slow.) Admittedly, you then loose the sandboxing, although IMO it seems like it should safe to trust code being served by Google on the official Youtube domain. (You don’t have to trust Google in general to trust that they won’t serve you actual malware.)
> What environment are you using that: - Has access to Youtube - Can run Python code - Can’t run JS code
Nothing specific, just tend to run tools in restricted VMs where things are whitelisted and it's pretty much as locked down as it can be. It can run whatever I want it to run, including JS, and as the logs in my previous comment shows, it is in fact running both Python and JS, and has access to YouTube, otherwise it wouldn't have worked :)
I tend to have the rule of "least possible privileges" so most stuff I run like that has to be "prepped" basically, especially things that does network requests sometimes (updating the solver in this case), just a matter of packaging it before I run it, so it's not the end of the world.
No weird OS or architecture here, just good ol' Linux.
> IMO it seems like it should safe to trust code being served by Google on the official Youtube domain
> > IMO it seems like it should safe to trust code being served by Google on the official Youtube domain
Which came from a misunderstanding about where the downloadable solver script comes from, as it doesn't come from youtube.com, it comes from github.com (yt-dlp org), I was just correcting that misunderstanding.
> You can’t very well run yt-dlp without trusting yt-dlp code.
That makes a ton of sense and I agree! I'm not sure how that is related to anything though? I download yt-dlp from Arch repositories, so yes I'm trusting Arch maintainers and of course yt-dlp developers. Then I'm adding a manifest which controls what this application can actually access, which is basically a VM config, where I define that it can access youtube.com (and a bunch of other sites I mirror/archive). This is the part that shouldn't have github.com/* access.
Again as mentioned, not a big issue, plenty of workarounds, so not the end of the world.
> Which came from a misunderstanding about where the downloadable solver script comes from, as it doesn't come from youtube.com, it comes from github.com (yt-dlp org), I was just correcting that misunderstanding.
But that script is ultimately running a JS challenge from Youtube, right? That’s why we actually needed a JS runtime in the first place.
Restricting or sandboxing software is something I've been looking into recently. Would you mind sharing what you use and possibly an example as well? Perhaps an example for yt-dlp?
> What environment are you using that: - Has access to Youtube - Can run Python code - Can’t run JS code
They didn't say “can't run JS code”, but that from that location the solver could not be downloaded currently. It could be that it is an IPv6-only environment (IIRC youtube supports IPv6 but github does not), or just that all external sites must be assessed before whitelisted (I'm not sure why youtube would be but not github, but it is certainly possible).
It's just me being paranoid after seeing npm/pypi supply chain attacks, and since then I basically run most software touching the internet in a VM one way or another.
I think in this case, my own laziness is what makes it worse than it has to, currently I'm doing whitelisting by domains, so youtube.com for the yt-dlp runner is obviously OK, and I'd want to avoid whitelisting github.com for that, since it's just downloading one JS file.
For now manually copying the config file into my SCM or just whitelisting GitHub for initial download does the trick. I guess I just had to squeeze in one complaint in my previous comment so I could get the HN stamp of approval, can't be too positive.
You could serve the files yourself from a server populated by updating them from github after review. You'd need to either sign the domain with your own CA that the host running yt-dlp trusts, or patch yt-dlp to use a different server name, but neither of those steps should be too onerous.
I've just hit the IPv6 problem. I routinely use yt-dlp -6 to cycle through my (basically infinite) set of IPv6 addresses. However when you do this, it tries the github EJS download over IPv6, which fails as github doesn't support IPv6 (because it's still the year 2000 over there).
Actually I think this is kind of a yt-dlp bug, since it doesn't need to use IPv6 for the github download.
You can set up a 'socat' process to listen on a certain IPv6 address and relay traffic to GitHub, and add it to your hosts file, and you don't even need to break TLS since it's forwarding traffic unchanged
A solver running at 50ms instead of 1ms I would say is practically imperceptible to most users, but I don't know what time span you are measuring with those numbers.
$ time ./v8 /bench/yt-dlp.js | md5sum -
a730e32029941bf1f60f9587a6d9554f -
real 0m0.252s
user 0m0.386s
sys 0m0.074s
$ time ./quickjs /bench/yt-dlp.js | md5sum -
a730e32029941bf1f60f9587a6d9554f -
real 0m2.280s
user 0m2.507s
sys 0m0.031s
So about 10x slower for the current flavor of YouTube challenges: 0.2s -> 2.2s.
A few more results on same input:
spidermonkey 0.334s
v8_jitless 1.096s => about the limit for JIT-less interpreters like quickjs
graaljs 2.396s
escargot 3.344s
libjs 4.501s
brimstone 6.328s
modernc-quickjs 12.767s (pure Go port of quickjs)
fastschema-qjs 1m22.801s (Wasm port of quickjs)
boa 1m28.070s
quickjs-ng 2m49.202s
node(v8) : 1.25s user 0.12s system 154% cpu 0.892 total
quickjs : 6.54s user 0.11s system 99% cpu 6.671 total
quickjs-ng: 545.55s user 202.67s system 99% cpu 12:32.28 total
A 5x slowdown for an interpreted C JS engine is pretty good I think, compared to all the time, code and effort put into v8 over the years!
I've found having yt-dlp available on my iPhone useful, and used Pythonista to achieve that, but haven't figured out how to get the new requirements to work yet. Would love any ideas people have!
If the concern is security, it sounds like the team went to great lengths to ensure the JS was sandboxed (as long as you’re using Deno).
If you’re using some sort of weird OS or architecture that Deno/Node doesn’t support, you might consider QuickJS, which is written in pure C and should work on anything. (Although it will be a lot slower, I’m not clear just how slow.) Admittedly, you then loose the sandboxing, although IMO it seems like it should safe to trust code being served by Google on the official Youtube domain. (You don’t have to trust Google in general to trust that they won’t serve you actual malware.)