I think we could use a small piece of technology to make this sort of thing easier to manage.
The thing about formats is that if one person uses the format, the means of processing can be lost. But if you can get any sizable number of people to use a format, maybe as few as 10,000, and sufficient information encoded in it, then there will be sufficient incentive to decode it.
I have been working on a Web Service for sharing information stores for things I call 'catalogs'. Think 'passwords', 'bookmarks', 'contacts'. The type of data that users enter on multiple devices but want to be able to retrieve from any of them as if they were one device.
At the moment, I have mostly been considering the problem from the point of view of a set of devices that have an unreliable connection to a Web Service. So I have been considering cases like 'Alice adds Bob to her contacts on one device while it is offline and then to another that is online, how are these synchronized'. The type of problem that OneDrive etc. face.
There is the converse of the problem in which a repository is maintained by multiple Web Services with some sort of voting mechanism to maintain consistency. Folk who remember DECNET clusters will remember the Quorum scheme they used.
I developed a prototype of a container format using Merkle Trees for integrity checking and random access:
The scheme is purposefully designed to separate integrity checking on the data content, integrity checking on content metadata and unchecked data. It also supports fast random access to arbitrary records in an append-only log.
We could wrap a simple Web Service around it with operations such as:
* Add item
* Retrieve item with key X having value Y
* Obtain best proof of integrity for record Z
As with all these block-chainy sort of schemes, it is easy enough to check prior data values if you have an output digest that you can rely on. Establish trust in an output digest is rather more tricky.