> I would like to import (lots of) Apache parquet files to a PostgreSQL 11 you might be intersted in spark-postgres library. Basically the library allows you to bulk load parquet files in one spark command: > spark > .read.format("parquet") > .load(parquetFilesPath) // read the parquet files > .write.format("postgres") > .option("host","yourHost") > .option("partitions", 4) // 4 threads > .option("table","theTable") > .option("user","theUser") > .option("database","thePgDatabase") > .option("schema","thePgSchema") > .loada // bulk load into postgres more details at https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres On Tue, Nov 05, 2019 at 03:56:26PM +0100, Softwarelimits wrote: > Hi, I need to come and ask here, I did not find enough information so I hope I > am just having a bad day or somebody is censoring my search results for fun... > :) > > I would like to import (lots of) Apache parquet files to a PostgreSQL 11 > cluster - yes, I believe it should be done with the Python pyarrow module, but > before digging into the possible traps I would like to ask here if there is > some common, well understood and documented tool that may be helpful with that > process? > > It seems that the COPY command can import binary data, but I am not able to > allocate enough resources to understand how to implement a parquet file import > with that. > > I really would like follow a person with much more knowledge than me about > either PostgreSQL or Apache parquet format instead of inventing a bad wheel. > > Any hints very welcome, > thank you very much for your attention! > John -- nicolas