Hi Jacek, On 06.10.2022 05:54, Jacek Pliszka wrote:
I found an old thread about adding it to Orcus library instead. Is it the best approach?
It is an approach. But I wouldn't say it's the best approach. Orcus library has traditionally been geared more toward supporting text-file based file formats, such as csv, xlsx, ods, gnumeric etc ., whereas my understanding of parquet file format is that it is a binary file format.
If Orcus could use arrow library then it should be relatively easy. similar to .csv files.
Yes, I believe that's doable. Having said that, it's my understanding that the arrow library provides a nice abstraction optimized for columnar in-memory formats. So, if we were to use it in orcus, which is not necessarily optimized for columnar in-memory formats, we may lose some efficiency just by having to potentially go through two layers of abstraction that both have different focus. Someone would need to take a closer look at the design of the arrow library and decide which approach makes more sense: using it in orcus or using it directly in the libreoffice codebase.
I would have been very happy to take a closer look at the arrow library. But right now I'm trying to finish up all the features that need to go into the next release of orcus, so I won't be able to do that anytime soon unfortunately.
Kohei