Search Postgresql Archives

Re: Loading 500m json files to database

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mar 23, 2020, at 7:11 PM, David G. Johnston <david.g.johnston@xxxxxxxxx> wrote:

On Mon, Mar 23, 2020 at 3:24 AM pinker <pinker@xxxxxxx> wrote:
time for i in datafiles/*; do
  psql -c "\copy json_parts(json_data) FROM $i"&
done

Don't know whether this is faster but it does avoid spinning up a connection multiple times.

#bash, linux
    function append_each_split_file_to_etl_load_script() {
        for filetoload in ./*; do
            ronumber="$(basename $filetoload)"
            # only process files since subdirs can be present
            if [[ -f "$filetoload" ]]; then
                echo ""
                echo "\set invoice"' `cat '"'""$filetoload""'"'`'
                echo ", ('$ronumber',:'invoice')"
            fi >> "$PSQLSCRIPT"
        done

        echo ""  >> "$PSQLSCRIPT"
        echo ";" >> "$PSQLSCRIPT"
        echo ""  >> "$PSQLSCRIPT"
    }

There is a bit other related code that is needed (for my specific usage) but this is the core of it.  Use psql variables to capture the contents of each file into a variable and then just perform a normal insert (specifically, a VALUES (...), (...) variant).  Since you can intermix psql and SQL you basically output a bloody long script, that has memory issues at scale - but you can divide and conquer - and then "psql --file bloody_long_script_part_1_of_100000.psql".

David J.

Can one put 550M files in a single directory?  I thought it topped out at 16M or so.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux