Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The first thing I'd recommend doing is find the unique entries THEN sort. Unless uniq has vastly better performance on sorted files...


Finding duplicates in unsorted data is pretty time consuming.


You should see the people at my company do it by hand with pieces of paper! With n approaching 20,000 sometimes. (Yes, we're working on automating this even as we speak.)


Would you mind giving more details on this? I don't understand how it would be possible for a human to do this with n > 1kish, and even then I would imagine it being horribly slow.


I'm pretty sure actually that they don't do it when it gets to 20,000 docs (and thus we pay a bit extra for postage/materials than we might otherwise have to), exactly for the same reason you think so. But that's the adamant claim of those from whom requirements are gathered. It'll take me 5 minutes to implement the filter function to make this happen -- far less time than it'd take for me to sit down with them and have them prove that they actually do this operation. So I haven't pressed the issue.

I know for a fact that they do do it on smaller batches, though. It takes a lot of room, as you can imagine!


uniq requires a sorted file




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: