Le jeudi 07 février 2008, Greg Smith a écrit : >Le mercredi 06 février 2008, Dimitri Fontaine a écrit : >> In other cases, a logical line is a physical line, so we start after first >> newline met from given lseek start position, and continue reading after the >> last lseek position until a newline. > > Now you're talking. Find a couple of split points that way, fine-tune the > boundaries a bit so they rest on line termination points, and off you go. I was thinking of not even reading the file content from the controller thread, just decide splitting points in bytes (0..ST_SIZE/4 - ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by beginning to process input after having read first newline, etc. And while we're still at the design board, I'm also thinking to add a per-section parameter (with a global default value possible) split_file_reading which defaults to False, and which you'll have to set True for pgloader to behave the way we're talking about. When split_file_reading = False and section_threads != 1 pgloader will have to manage several processing threads per section but only one file reading thread, giving the read input to processing theads in a round-robin fashion. In the future the processing thread choosing will possibly (another knob) be smarter than that, as soon as we get CE support into pgloader. When split_file_reading = True and section_threads != 1 pgloader will have to manage several processing threads per section, each one responsible of reading its own part of the file, processing boundaries to be discovered at reading time. Adding in here CE support in this case means managing two separate thread pools per section, one responsible of splitted file reading and another responsible of data buffering and routing (COPY to partition instead of to parent table). In both cases, maybe it would also be needed for pgloader to be able to have a separate thread for COPYing the buffer to the server, allowing it to continue preparing next buffer in the meantime? This will need some re-architecturing of pgloader, but it seems it worth it (I'm not entirely sold about the two thread-pools idea, though, and this last continue-reading-while-copying-idea still has to be examined). Some of the work needing to be done is by now quite clear for me, but a part of it still needs its design-time share. As usual though, the real hard part is knowing what we exactly want to get done, and we're showing good progress here :) Greg's behavior: max_threads = N max_parallel_sections = 1 section_threads = -1 split_file_reading = True Simon's behaviour: max_threads = N max_parallel_sections = 1 # I don't think Simon wants parallel sections section_threads = -1 split_file_reading = False Comments? -- dim
Attachment:
signature.asc
Description: This is a digitally signed message part.