thanks all, i will be looking into it. Met vriendelijke groet, Henk On 16 jun. 2012, at 18:23, Edson Richter <edsonrichter@xxxxxxxxxxx> wrote: > Em 16/06/2012 12:59, hb@xxxxxxxxxxxxxx escreveu: >> thanks i thought about splitting the file, but that did no work out well. >> >> so we receive 2 files evry 30 seconds and need to import this as fast as possible. >> >> we do not run java curently but maybe it's an option. >> are you willing to share your code? >> >> also i was thinking using perl for it >> >> >> henk >> >> On 16 jun. 2012, at 17:37, Edson Richter <edsonrichter@xxxxxxxxxxx> wrote: >> >>> Em 16/06/2012 12:04, hb@xxxxxxxxxxxxxx escreveu: >>>> hi there, >>>> >>>> I am trying to import large data files into pg. >>>> for now i used the. xarg linux command to spawn the file line for line and set and use the maximum available connections. >>>> >>>> we use pg pool as connection pool to the database, and so try to maximize the concurrent data import of the file. >>>> >>>> problem for now that it seems to work well but we miss a line once in a while, and that is not acceptable. also it creates zombies ;(. >>>> >>>> does anybody have any other tricks that will do the job? >>>> >>>> thanks, >>>> >>>> Henk >>> I've used custom Java application using connection pooling (limited to 1000 connections, mean 1000 concurrent file imports). >>> >>> I'm able to import more than 64000 XML files (about 13Kb each) in 5 minutes, without memory leaks neither zombies, and (of course) no missing records. >>> >>> Besides I each thread import separate file, I have another situation where I have separated threads importing different lines of same file. No problems at all. Do not forget to check your OS "file open" limits (it was a big issue in the past for me due Lucene indexes generated during import). >>> >>> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 5, Sun Java 1.6.27. >>> >>> Regards, >>> >>> Edson Richter >>> >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general > I'm not allowed to publish my company's code, but the logic if very easy to understand (you will have to "invent" your own solution, below code is bare bone): > > class MainThread implements Runnable { > private boolean keepRunning = true; > > public void run() { > while(keepRunning) { > try { > executeFiles(); > Thread.sleep(30000); // sleep 30 seconds > } catch(Exception ex) { > ex.printStackTrace(); > } > } > } > > private void executeFiles() { > File monitorDir = new File("/var/mydatafolder/"); > File processingDir = new File("/var/myprocessingfolder/"); > > // I'll import only files with names like "data20120621.csv": > FileFilter fileFilter = new FileFilter() { > public boolean accept(File file) { > boolean isfile = file.isFile() && !file.isHidden() && !file.isDirectory(); > if(!isfile) return false; > String fname = file.getName(); > return fname.startsWith("data") && (file.getName().endsWith("csv")); > } > }; > > List<File> forProcessing = monitorDir.listFiles(fileFilter); > > for(File fileFound : forProcessing) { > // FileUtil is a utility class, you will have to create your own... your move method will vary according your Operating System > FileUtil.move(fileFound, processingDir); > // ProcessFile is a class that implements Runnable, and do your stuff there... > Thread t = new Thread(new ProcessFile(processingDir, fileFound.getName())); > t.start(); > } > } > > /** Use this method to stop the thread from another place in your complex system! */ > public void synchronized stopWorker() { > keepRunning = false; > } > > public static void main(String [] args) { > Thread t = new Thread(new MainThread()); > t.start(); > } > } > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general