Michael, just checking in to see what might be a good time to chat. We're excited to connect!
Aditya
On Fri, Dec 13, 2019 at 2:22 PM Aditya Parameswaran <adityagp@xxxxxxxxxxxx> wrote:
Michael,We'd love to meet and discuss! Unfortunately, a lot of us are off for break starting next week so it might be best to sync up early next year.Would week of the 6th work for you? 8am PT/10am CT/4pm GMT any day should work!> We started by having the relational database be a simple persistent
> storage layer, when coupled with an index to retrieve data by position,
> can allow us to scroll through large datasets of billions of rows at
> ease. We developed a new positional index to handle insertions and
> deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I agree
> that pushing the computation to the relational database does have
> overheads; but at the same time, it allows for scaling to arbitrarily
> large datasets.
Ooh - nice paper. Your crawled data-set looks quite interesting too, we
run wide-scale crash-testing on the LibreOffice code-base across ~100k
files and enlarging our corpus there: or better, getting some
statistical view of which OOXML attributes (and thus features) are most
used out there would be extremely useful to us as we develop the core.
I like the data on spreadsheet and formula shape - that is very useful.
Do you have data on the geometry of formulae - as in rows vs. columns ?
[ we switched to columnar storage based mostly on experience rather than
hard data ;-].
It is also interesting to have access to very large (1.3m row)
data-sets that can have useful analysis done on them - would love to see
the source data there.Again, this is something that we'd be happy to share; this might just take a bit more work since it's an older codebase.I believe we did use the geometry of the formulae to determine the best storage representation, so it's there somewhere :-)Sounds good, cf. above - if we can't make that - early in the new year
would be great.
I look forward to talking,Likewise!Aditya
_______________________________________________ LibreOffice mailing list LibreOffice@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/libreoffice