On Thu, Dec 29, 2016 at 8:41 PM, rajmhn <rajmhn.ram@xxxxxxxxx> wrote: > Thanks Francis.That seems to be a good solution. Yep, but not for your problem as ... > > Thought to use pg_bulkload, a third party library instead of copy, where > reject handling can be done in efficient way. Mine was just an idea to do the part of the load you described assuming pg_bulkload usage was optional. Not being it, it will not work. MAYBE you can use the technique to preprocess the files for pg_bulkload ( if possible this is nice, as the goood thing of preprocessing them is you repeat until you get them right, no DB-touchy ). > Transformation(FILTER) > functions can be implemented with any languages in pg_bulkload before it was > loaded to table. SQL, C, PLs are ok, but you should write functions as fast > as possible because they are called many times. > In this case, function should be written in Perl and called inside the > Postgressql function. Do you think that will work it out? But pg_bulkload is > preferring C function over SQL function for performance. I'm not familiar with pg_bulkload usage. I've read about it but all my loading problemas have been solved better by using copy ( especially factoring total time, I already know to use copy and a couple dozen languages in which to write filters to preclean data for copy. In the time I learn enough of pg_bulkload I can load filter and load a lot of data ). Regarding C vs perl, it seems pg_bulkload does server side processing. In the server the funcion calling overhead is HUGE, specially when transitioning between different languages. IMO the time spent doing the data processing in perl would be 0 when compared with the time to pass the data around to perl. C will be faster because the calling barrier is smaller inside the server. Just for data processing of things like you I've normally found filters like the one I described can easily saturate an SSD array, and the difference in time for processing is dwarfed by the difference in time for developing the filter. In fact in any modern OS with write through and readahead disk management the normal difference between filtering in perl or C is perl may use 10% of 1 core, C 1%, perl filter is developed in 15 minutes, C in an hour, and perl filter takes some extra milliseconds to start. AND, if you are not familiar with processing data in C you can easily code a slower solution than in perl ( as perl was dessigned for this ). > I will try this option as you suggested. Just remember my option is not using pg_bulkload with perl stored procedures. I cannot recommend anything if you use pg_bulkload. I suggested using copy and perl to preclean the data. It just seemed to me from the description of your problem you were using a too complex tool. Now that you are introducing new terms, like reject handling, I'll step out until I can make a sugestion ( don't bother to define it for me, it seems a bulkload related term and I'm not able to study that tool ). FrancisCO Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general