I would like to add that text, as a thousand-years-old-technology, has always in...

rbinv · on Feb 17, 2021

> [edit: unfortunately hackernews doesn't display the unicode character U+1F426]

That got a good chuckle out of me. It actually shows how there's not just "text".

bmn__ · on Feb 17, 2021

It is just text (using Unicode nomenclature). The makers of the HN software deliberately broke the functionality in an obnoxious fashion.

yholio · on Feb 17, 2021

They weren't obnoxious, just subscribed to a more restricted interpretation of "text" than yourself, and to which many people seem to agree.

It's clearly a continuum from 3D objects, to pictures, to icons and ideographs, to highly abstract sings representing words and ultimately sounds. That fact that some designer of character sets decided to put the limit somewhere and include a bird icon and not an Obama icon does not invalidate other interpretations.

By the way, there is a thing called Emojicon where a character for 'Albert Einstein' was proposed as valid 'text'.

necovek · on Feb 17, 2021

While I totally agree with the OP, the point to remember is that text is a really terse storage as long as you have the Unicode mapping already stored to transform the binary compressed form to actual pictures of text (vs abstract forms or numeric codepoints when simply stored as "textual" bytes).

This does not diminish the value and expressiveness of text, but it needs to be said that in 5000 years time we'll need both the Unicode specification and those 2-4000 bytes to decipher the author's post.

It's just a cost of digital media.

The bigger issue is how do we ensure digital media perseveres for so long.

bawolff · on Feb 17, 2021

I dont think you would. If english is still understood, decoding ascii/utf-8 is trivial. Just think of it as a ceaser shift cipher with an offset of 64. Very trivial to decode with frequency analysis if you know its english. Either you understand that 65 represents the abstract concept A, or you have no idea what the letter A is. Either way, a picture of the letter A is not going to be helpful.

Having english still understood after 5000 years seems much harder. But hey, latin is pretty old and we still understand that.

majewsky · on Feb 17, 2021

> in 5000 years time we'll need both the Unicode specification and those 2-4000 bytes to decipher the author's post.

Nitpick: In all likelihood, an English dictionary would be enough. Even if the Unicode spec is lost, the text can probably be deciphered by using frequency analysis plus the dictionary to associate codepoints with characters.

necovek · on Feb 17, 2021

Sure, but the English dictionary is a bit longer than the Unicode spec :)

My point was that there is a lot of implied data that undermines this "compression" of information into "simply bytes" today: I am pointing out the implicit in OP's assumption.

Maybe I did not choose the right addendum, but a pretty comprehensive addendum is needed (in all likelihood, an "ancient English dictionary" is needed anyways).

zests · on Feb 17, 2021

Here, try this bird: 𓅭

https://en.wikipedia.org/wiki/Egyptian_Hieroglyphs_(Unicode_...