On 5/31/23 13:57, Lian Jiang wrote:
The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq)
from public.mytable x_tmp_uniq) to stdout"
postgres version: 14.7
Does this mean COPY and java CopyManager may not help since my psql
command already uses copy?
I don't think the issue is COPY itself but row_to_json(x_tmp_uniq).
This:
https://towardsdatascience.com/spark-essentials-how-to-read-and-write-data-with-pyspark-5c45e29227cd
indicates Spark can use CSV as an input source.
Given that I would just COPY the data out as CSV.
Regarding pg_dump, it does not support json format which means extra
work is needed to convert the supported format to jsonl (or parquet) so
that they can be imported into snowflake. Still exploring but want to
call it out early. Maybe 'custom' format can be parquet?
Thanks
Lian
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx