Le Wednesday 06 February 2008 18:37:41 Dimitri Fontaine, vous avez écrit : > Le mercredi 06 février 2008, Greg Smith a écrit : > > If I'm loading a TB file, odds are good I can split that into 4 or more > > vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 > > loaders at once, and get way more than 1 disk worth of throughput > > reading. > > pgloader already supports starting at any input file line number, and limit > itself to any number of reads: In fact, the -F option works by having pgloader read the given number of lines but skip processing them, which is not at all what Greg is talking about here I think. Plus, I think it would be easier for me to code some stat() then lseek() then read() into the pgloader readers machinery than to change the code architecture to support a separate thread for the file reader. Greg, what would you think of a pgloader which will separate file reading based on file size as given by stat (os.stat(file)[ST_SIZE]) and number of threads: we split into as many pieces as section_threads section config value. This behaviour won't be available for sections where type = text and field_count(*) is given, cause in this case I don't see how pgloader could reliably recognize a new logical line beginning and start processing here. In other cases, a logical line is a physical line, so we start after first newline met from given lseek start position, and continue reading after the last lseek position until a newline. *:http://pgloader.projects.postgresql.org/#_text_format_configuration_parameters Comments? -- dim
Attachment:
signature.asc
Description: This is a digitally signed message part.