Hacker Newsnew | past | comments | ask | show | jobs | submit | vlmutolo's commentslogin


This article is a little confusing. I think this is a roundabout way to invent the blocked bloom filter with k=2 bits inserted per element.

It seems like the authors wanted to use a single hash for performance (?). Maybe they correctly determined that naive Bloom filters have poor cache locality and reinvented block bloom filters from there.

Overall, I think block bloom filters should be the default most people reach for. They completely solve the cache locality issues (single cache miss per element lookup), and they sacrifice only like 10–15% space increase to do it. I had a simple implementation running at something like 20ns per query with maybe k=9. It would be about 9x that for native Bloom filters.

There’s some discussion in the article about using a single hash to come up with various indexing locations, but it’s simpler to just think of block bloom filters as:

1. Hash-0 gets you the block index

2. Hash-1 through hash-k get you the bits inside the block

If your implementation slices up a single hash to divide it into multiple smaller hashes, that’s fine.


Yeah I kind of think authors didn't conduct a thorough-enough literature review here. There are well-known relations between number of hash functions you use and the FPR, cache-blocking and register-blocking are classic techniques (Cache-, Hash-, and Space-Efficient Bloom Filters by Putze et. al), and there are even ways of generating patterns from only a single hash function that works well (shamelessly shilling my own blogpost on the topic: https://save-buffer.github.io/bloom_filter.html)

I also find the use of atomics to build the filter confusing here. If you're doing a join, you're presumably doing a batch of hashes, so it'd be much more efficient to partition your Bloom filter, lock the partitions, and do a bulk insertion.


Your blogpost is great! Except for one detail: you have used modulo n. If n is not known at compile time, multiply+shift is much faster [1]. Division and modulo (remainder) are slow, except on Apple silicon (I don't know what they did there). BTW for blocked Bloom filters, there are some SIMD variants that seem to be simpler than yours [2] (maybe I'm wrong, I didn't look at the details, just it seems yours uses more code). I implemented a register-based one in one in Java here [3].

Bulk insertion: yes, if there are many keys, bulk insertion is faster. For xor filters, I used radix sort before insertion [4] (I should have documented the code better), but for fuse filters and blocked Bloom filters it might not be worth it, unless if the filter is huge.

[1] https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-... [2] https://github.com/FastFilter/fastfilter_cpp/blob/master/src... [3] https://github.com/FastFilter/fastfilter_java/blob/master/fa... [4] https://github.com/FastFilter/fastfilter_cpp/blob/master/src...


Very interesting blog post. I’d never seen that method for quickly computing the patterns. I thought I had done a lot of research on bloom filters, too!

Post author here. Yes, you are correct. I was doing this code change 3 years ago when I was a junior dev, I was not familiar with a blocked bloom filters at that time. Looking back, it’s cool to see that I accidentally reinvented a basic blocked bloom.

I was also limited by the constraint of legacy code. This project was not a complete rewrite, but just an idea: "can we use more information from that 32-bit hash that we recieve in without regressing any perf". We didn't have a time for a deep research or a rewrite, so I just wanted to show the result of this small exercise on how we can make things run better without rewriting the world.


> Overall, I think block bloom filters should be the default most people reach for.

I think this depends on how big your filters are. Most people think of Bloom filters as having to have hundreds of thousands of elements, but I frequently find them useful all the way down to 32 bits (!). (E.g., there are papers showing chained hash tables where each bucket has a co-sited tiny Bloom filter to check if it's worth probing the chain.) In the “no man's land” in-between with a couple ten thousand buckets, the blocking seems to be mostly negative; it only makes sense as long as you actually keep missing the cache.


Are you talking about Cuckoo++ tables, perhaps? If not can you point me to the hash table you had in mind? Always fun to learn of a new approach.

https://github.com/technicolor-research/cuckoopp


IIRC, it's this paper: https://db.in.tum.de/~birler/papers/hashtable.pdf

I never implemented their hash table, but it opened my eyes to the technique of a tiny Bloom filter, which I've used now a couple of times to fairly good (if small) effect. :-)


Thanks! This'll be a fun read :)

Yeah, I agree with this. I think there are open addressing hash tables like Swiss Table that do something similar. IIRC, they have buckets with a portion at the beginning with lossy “fingerprints” of items, which kind of serve a similar purpose as a bloom filter.

Bloom filters are useful for sharding so it stands to reason that a hash table implemented with shards would benefit.

Problem is bloom isn’t close to the theoretical space complexity of the idea it implements and if you add 15% then it starts becoming attractive to switch to one that gets a tighter bound on the space complexity.

Cause it was written by AI.the entire mid section is classic AI slop writing. Repeating the same points and numbers over and over, repackaging the same idea with "key takeaway" and shit. The voice of the author is heavily AI coded there.

I wonder if the “spawn” API is ever preferable over “fork”. Do we really want to remove context if we can help it? There will certainly be situations where we have to, but then what you want is good compaction for the subagent. “Clean-slate” compaction seems like it would always be suboptimal.

This is my question also but a bit different.

Is there any reason to explicitly have this binary decision.

Instead of single primitive where the parent dynamically defines the childs context. Naturally resulting in either spawn or fork or anything in between.


That actually sounds even better than the binary. Thanks for the suggestion!

context rot and bias removal would be two good reasons to start a freshly spawned agent.

AmIUnique.org has a good collection of non-cookie tracking techniques.

https://amiunique.org/fingerprint


I wonder how this compares to similar initiatives by e.g. Sony [0] and Leica [1].

[0]: https://authenticity.sony.net/camera/en-us/

[1]: https://petapixel.com/2023/10/26/leica-m11-p-review-as-authe...


Canon gave users the option to sign their photographs with "add original decision data". It got cracked.

* https://petapixel.com/2010/12/01/russian-software-firm-break...

* https://www.elcomsoft.com/presentations/Forging_Canon_Origin...


and you think this rushed product won't be?


I would broadly expect software made by most camera brands to be shit, while I would expect a developer who creates their own hardware projects (generally, not talking specifically about cameras) to range from idiots who have no idea what they're doing (like me, though to be fair I also wouldn't release it believing it to be good) to highly skilled coders who would get it right despite being on their own.

So I wouldn't automatically assume that a product like this would be better designed, but I would think there's a chance it might have been!


Probably won't be cracked, as there will be little to no interest as such device will have no use in any professional setting.


I knew that Leica is generally expensive, but the model on the review is insanely expensive (over 10K USD?). It is not even comparable.


It's not the camera that is important though, but the technology:

https://spec.c2pa.org/specifications/specifications/2.2/inde...


Compared to those, this is like a weekend project that a high school student could accomplish.



It’s vibrating at a microscopic level. You won’t be able to feel it at all.


What are your thoughts on how Bevy's developing UI toolkit compares (in terms of goals and use cases) to some of the other Rust efforts in the space (egui, xilem, iced, etc.)? Do you expect it will be specialized/limited to scene development for games?


About 7% of people who have ever lived are alive today. Still pretty lucky, but not quite winning the lottery.


Much luckier if you consider everyone who ever will live, assuming we don’t destroy ourselves.


To kill a mosquito, you need "a few tens of millijoules, delivered within a few milliseconds" [0], so let's say 10W. To destroy the Earth (so that it turns into scattered dust and never reforms) you need about 10^32 J [1]; if we assume this is applied over maybe 100s, the laser would be 10^30W.

So the log10 scale goes from 1–30, where mosquitos die at 1 and the Earth dies at 30. The 2 PW in the article is about a 15.3. The Vulcan 20-20 project (set to complete in 2029) will register at about 20PW, or a 16.3 on the mosquito-Death Star scale [2].

So on a log scale, we're over halfway to building the Death Star.

[0]: https://spectrum.ieee.org/backyard-star-wars

[1]: https://www.scientificamerican.com/article/how-much-energy-w...

[2]: https://news.sky.com/story/worlds-most-powerful-laser-to-be-...


If we're comparing lasers that go for seconds and lasers that go for femtoseconds, I think measuring watts is way too misleading.

Measured in simple joules, mosquito is .04, earth is 10^32, and this laser is 50.

If we make a joules version of the 1-30 scale, the laser in the article would only score a 4.


For sure; using total energy delivered makes a lot more sense. But then I think it would be better to use whatever tool humanity has that delivers the max total energy; let’s say Tsar Bomba.

Let’s say the mosquito is 1 again, so Death Star is 34. Tsar Bomba would be about 17.3. Over halfway again!

It’s kind of surprising that our max power output and max energy output are about the same on these scales.


So I could kill all the mosquitoes in my yard in one pulse with this laser.


If they get close enough together for that pulse to hit them all.


Just send me out there and aim the pulse at the one spot in the middle of my back I just can't quite reach.


> if we assume this is applied over maybe 100s

I think this is the crux of the assumption right here. It sounds like this is apply for well under a nanosecond.

I think we're closer to maybe killing a mosquito than "half way to building a Death Star on a log scale" (which, I guess is already much closer to a mosquito than a planet).


Every step is a doubling of the previous step, no?

So true "half way to a death star" is step 29/30?


They used log10, so each step is 10x the previous, so in a linear sense, it would double when going from about 29.7 to 30. But it seems that humans tend to improve tech at exponential rates, where we are constantly making improvements here and there that keep stacking up, when it comes to things that are actually in a developmental stage anyways.

Say your "endstage" goal is GPU with 200 billion transistors. Using linear scale, the current biggest GPU is only halfway there, and it took all of human civilization to get this far, and it will take another civilization to get to 200b. In reality, we'll have that in a couple years with our current civilization.


>humans tend to improve tech at exponential rates

A hypothetical "death star" project like this would require improvements in energy generation/storage capacity/etc., which haven't improved in nearly the way transistor production has (and are also much more limited by physical realities, such as the specific heat, enthalpy of combustion etc. of materials).


Yes, extremely high sustained power lasers still have a hard time competing with hypersonic projectiles in energy delivered. The difference in being able to throw nuclei at the problem.


This is awesome! Thank you!!

I’m still not sure what 15.3 on the MDS scale can destroy but I am sure the Emperor will be pleased to hear that we are half-way to building the Death Star.


The on a log scale is doing a fair amount of lifting here I think


Next question: on what planet/moon should the mosquito be to be _just_ safe from the laser?


Well, mass scales as the cube of radius, and we have 15 orders of magnitude to work with, so I guess it should be an object on the order of hundreds of meters in radius. But as noted, the duration of firing matters as well. Given https://news.ycombinator.com/item?id=44054239, the actual laser can only vaporize much smaller things.


Thank you for this, I will be using this for the rest of my life!


I don’t like log scales, they’re not intuitive.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: