I created a file format designed for sharing data in-between language learning apps (JSON based + signature). It’s not released yet but if someone is interested please reply below. It would be great to have software other than mine using it.
I'm interested, though primarily because i'm duplicating the work you're describing - hah.
I'm interested in writing two main components:
1. Information storage, with retention / etc. My initial feature scope is likely going to focus on retention (spaced rep) and storage (archiving sources), long term i want to dog food a general purpose knowledge base.
2. Data storage. Unfortunately every time i start working on the schemas for the information storage i end up reinventing so many features of general purpose data storage, like versioning, data deduplication, etc - i end up with a clone of Git :(
So i'd love to hear some words of wisdom on this subject. I imagine you - sanely - focus more on #1, but that's not an area i've obsessed about yet too much.
This whole topic does interest me though - part of the reason i'm writing the data layer is because i want a git-like behavior to be able to slice other sources of information up with the ease and safety of git. So the idea of information source interop is close to my hear - but the idea of IO with external formats has not been big on my radar.
As i make progress with the information side of things, i'll have to keep that in mind. My model embraced self-hosting and i really want to avoid information hostage. So this convo has already been valuable :)