English is the best candidate because it has the second largest user base (1.2 B...

krdln · on March 18, 2015

Even if 1.2 billion seems a lot, that's still a small fraction of a world's population. So every choice of a universal language would force majority of a world to learn new one. So that's why I think winning popularity contest is a poor argument and we shouldn't look at that and focus on things like simplicity (which I don't find in English), speed of learning, consistency, expressiveness etc. I'd be happy to use Lojban (it's easier for machines too, I guess) or any other invented language. If I had to pick one from popular ones, I'd like Spanish more than English.

I was asking what are your specific usecases, which forbid you to treat UTF-8 string as a black box blob of bytes? If dealing with international code, you'd rather want to use predefined functions. If you want to limit yourself to ASCII, just do it and simply don't touch bytes >= 0x80.

And what is a character? Do you mean graphemes or codepoints? Or something else? Few years before I was thinking like you – that calculating length is a useful feature. But most often when you think about your usecase, you realise either that you don't need length or you need some other kind of length: like monospace-width, rendered-width or some kind of entropy-based amount of information. Twitter is the only case I know, where you want to really count "characters". And I find it really silly: eg. Japanese tweet vs. English tweet.

ticking · on March 18, 2015

With Unicode these predefined functions have to be large and complex. You won't be able to use them on embedded systems for example.