Document Liberation Project

archagon · on May 26, 2018

Tangential, but this is one of the many things that excites me about CRDTs. By using CvRDTs for data fields, you can define completely open and fairly arbitrary document formats that support real-time collaboration out of the box. The same document could be simultaneously edited by multiple apps and devices (online or offline) without ever having to ask the user to manually pick a revision or merge changes. This means that instead of relying on bloated, all-in-one programs for document editing that invariably centralize data and satisfy no one, you could run a suite of precisely-targeted micro-apps—drawing palates, text editors, color pickers, typesetters—that all collaborate on the same document.

I know it's been tried before (OpenDoc, if my understanding is correct?), but CRDTs weren't around back then. This could be the one technical advancement to finally make the system work!

(I've written a long article about this recently, but I'm working on a revision before making another post on HN: http://archagon.net/blog/2018/03/24/data-laced-with-history/)

teddyh · on May 26, 2018

I had to scroll quite a few paragraphs into your article before you even explain what CRDT stands for. For the curious, it’s “Conflict-Free Replicated Data Types”¹.

1. https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...

archagon · on May 26, 2018

Yes, it's explained in this section[1], but I do recommend skimming the full Wikipedia article. (To clarify, I don't get into CRDTs right off the bat. It simply turns out to be the best solution to my sync problem, which is described in the first section.)

[1]: http://archagon.net/blog/2018/03/24/data-laced-with-history/...

teddyh · on May 26, 2018

Yes, but since you use the abbreviation “CRDT” in the title, you’d think you could explain it a bit sooner.

nine_k · on May 27, 2018

If I see a word unknown to me used prominently in an article, I assume that the article is for those already knowing its meaning, and go google it.

archagon · on May 27, 2018

I think I'd find it difficult to wedge into that first section, but perhaps I'll add a link in the first parenthetical paragraph.

jl6 · on May 26, 2018

Does anybody know of a project to develop a converter for OneNote files? Tricky to say what it should convert to, but PDF or HTML would be a good start.

Even OneNote’s own export routines don’t faithfully preserve the content (e.g. embedded files).

ATsch · on May 26, 2018

Perhaps try their GDPR takeout?

jl6 · on May 30, 2018

Does such a thing exist?

_phaq · on May 27, 2018

Don't think this preserves embedded files either, but to convert to HTML, you can export as webpage which gives you a HTM-file, then open that in Internet Explorer, right-click -> View Source and then copy the HTML from there.

Not actually readable HTML either, though...

yorwba · on May 27, 2018

What's the significance of opening it Internet Explorer? Doesn't the HTM-file already contain HTML?

jancsika · on May 26, 2018

> The Document Liberation Project was created to empower individuals, organizations, and governments to recover their data from proprietary formats and provide a mechanism to transition that data into open and standardised file formats, returning effective control over the content from computer companies to the actual authors. To achieve this, The Document Liberation Project develops software libraries that applications can use to read data in proprietary formats.

Ok, so serious question-- what is DLP's official position on Sci-Hub, a project that was created to empower individuals, organizations, and governments to recover their data from proprietary databases, returning effective control over the content from companies to the actual citizenry?

(To achieve this, sci-hub has a web service and a document store that users can use to read data from proprietary databases.)

_pfxa · on May 26, 2018

The two are completely irrelevant. DLP is about document format lock in. I.e. what happens to your .docx when Word dies out. Sci-Hub is sth. completely different.

mirimir · on May 27, 2018

As much as I love Sci-Hub, I wouldn't claim that some paper I want is "my data".

jonathanoliver · on May 26, 2018

Ah yes, good old Word Star 2000 and Lotus Ami Pro...

g105b · on May 26, 2018

Hey the peasants need software too.

qrbLPHiKpiux · on May 27, 2018

This is why I’m a big fan of plaintext for all documentation.

The old days of 80W.

jwilk · on May 27, 2018

What's 80W?

rad_gruchalski · on May 28, 2018

Assumption: https://softwareengineering.stackexchange.com/questions/1486...