Search Postgresql Archives

Re: any solution for doing a data file import spawning it on multiple processes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thanks all, i will be looking into it. 

Met vriendelijke groet,

Henk 

On 16 jun. 2012, at 18:23, Edson Richter <edsonrichter@xxxxxxxxxxx> wrote:

> Em 16/06/2012 12:59, hb@xxxxxxxxxxxxxx escreveu:
>> thanks i thought about splitting the file, but that did no work out well.
>> 
>> so we receive 2 files evry 30 seconds and need to import this as fast as possible.
>> 
>> we do not run java curently but maybe it's an option.
>> are you willing to share your code?
>> 
>> also i was thinking using perl for it
>> 
>> 
>> henk
>> 
>> On 16 jun. 2012, at 17:37, Edson Richter <edsonrichter@xxxxxxxxxxx> wrote:
>> 
>>> Em 16/06/2012 12:04, hb@xxxxxxxxxxxxxx escreveu:
>>>> hi there,
>>>> 
>>>> I am trying to import large data files into pg.
>>>> for now i used the. xarg linux command to spawn the file line for line and set  and use the  maximum available connections.
>>>> 
>>>> we use pg pool as connection pool to the database, and so try to maximize the concurrent data import of the file.
>>>> 
>>>> problem for now that it seems to work well but we miss a line once in a while, and that is not acceptable. also it creates zombies ;(.
>>>> 
>>>> does anybody have any other tricks that will do the job?
>>>> 
>>>> thanks,
>>>> 
>>>> Henk
>>> I've used custom Java application using connection pooling (limited to 1000 connections, mean 1000 concurrent file imports).
>>> 
>>> I'm able to import more than 64000 XML files (about 13Kb each) in 5 minutes, without memory leaks neither zombies, and (of course) no missing records.
>>> 
>>> Besides I each thread import separate file, I have another situation where I have separated threads importing different lines of same file. No problems at all. Do not forget to check your OS "file open" limits (it was a big issue in the past for me due Lucene indexes generated during import).
>>> 
>>> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 5, Sun Java 1.6.27.
>>> 
>>> Regards,
>>> 
>>> Edson Richter
>>> 
>>> 
>>> -- 
>>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
> I'm not allowed to publish my company's code, but the logic if very easy to understand (you will have to "invent" your own solution, below code is bare bone):
> 
> class MainThread implements Runnable {
>    private boolean keepRunning = true;
> 
>    public void run() {
>        while(keepRunning) {
>            try {
>                executeFiles();
>                Thread.sleep(30000); // sleep 30 seconds
>            } catch(Exception ex) {
>                ex.printStackTrace();
>            }
>        }
>    }
> 
>    private void executeFiles() {
>        File monitorDir = new File("/var/mydatafolder/");
>        File processingDir = new File("/var/myprocessingfolder/");
> 
>        // I'll import only files with names like "data20120621.csv":
>        FileFilter fileFilter = new FileFilter() {
>            public boolean accept(File file) {
>                boolean isfile = file.isFile() && !file.isHidden() && !file.isDirectory();
>                if(!isfile) return false;
>                String fname = file.getName();
>                return fname.startsWith("data") && (file.getName().endsWith("csv"));
>             }
>         };
> 
>        List<File> forProcessing = monitorDir.listFiles(fileFilter);
> 
>        for(File fileFound : forProcessing) {
>            // FileUtil is a utility class, you will have to create your own... your move method will vary according your Operating System
>            FileUtil.move(fileFound, processingDir);
>            // ProcessFile is a class that implements Runnable, and do your stuff there...
>            Thread t = new Thread(new ProcessFile(processingDir, fileFound.getName()));
>            t.start();
>        }
>    }
> 
>    /** Use this method to stop the thread from another place in your complex system! */
>    public void synchronized stopWorker() {
>        keepRunning = false;
>    }
> 
>    public static void main(String [] args) {
>        Thread t = new Thread(new MainThread());
>        t.start();
>    }
> }
> 
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux