The first thing I'd recommend doing is find the unique entries THEN sort. Unless...

sausagefeet · on Aug 14, 2010

Finding duplicates in unsorted data is pretty time consuming.

glhaynes · on Aug 14, 2010

You should see the people at my company do it by hand with pieces of paper! With n approaching 20,000 sometimes. (Yes, we're working on automating this even as we speak.)

peripitea · on Aug 15, 2010

Would you mind giving more details on this? I don't understand how it would be possible for a human to do this with n > 1kish, and even then I would imagine it being horribly slow.

glhaynes · on Aug 15, 2010

I'm pretty sure actually that they don't do it when it gets to 20,000 docs (and thus we pay a bit extra for postage/materials than we might otherwise have to), exactly for the same reason you think so. But that's the adamant claim of those from whom requirements are gathered. It'll take me 5 minutes to implement the filter function to make this happen -- far less time than it'd take for me to sit down with them and have them prove that they actually do this operation. So I haven't pressed the issue.

I know for a fact that they do do it on smaller batches, though. It takes a lot of room, as you can imagine!

secretasiandan · on Aug 14, 2010

uniq requires a sorted file