Junio C Hamano <gitster@xxxxxxxxx> wrote: > I ran strace and found that fast-import retains three windows to > the same data that was opened while the pack was still being > built (i.e. the filename is still tmp_pack_XXXXXX) when it dies: > > {next = 0x6f6d20, base = 0x6f77a0 "PACK", offset = 0, len = 907, > last_used = 9, inuse_cnt = 0} > {next = 0x728630, base = 0x6f7160 "PACK", offset = 0, len = 500, > last_used = 5, inuse_cnt = 0} > {next = 0x0, base = 0x6f6d50 "PACK", offset = 0, len = 261, > last_used = 1, inuse_cnt = 0} Ouch. Hmm, well, maybe it shouldn't keep three windows open to the same part of the file, but with different lengths. :) But that's another issue. > The following patch seems to fix the issue for me, but this is > primarily meant for discussion, I agree the patch should fix this particular issue and the performance difference it will cause is minor enough to not be worth discussing further. But I have one minor comment (please see it below). > as I do not quite understand why > the same issue does not manifest itself when NO_MMAP is not > used. When NO_MMAP is off (we are actually using the OS's true mmap) the data is updated into the window when the file is written, as the window is backed by the OS's filesystem cache. If the data access is outside of the window (past the offset of the longest window, here 907) we would just open a new window to the relevant region of the file. But when it is inside the window, the window being backed by the filesystem cache saved us. In the case of NO_MMAP this doesn't work, so we get a failure later during object access. In particular if you look at gfi_unpack_entry() (which is where we cause windows to be opened on the file being created) we tell sha1_file.c it has 20 bytes available in the window *beyond* the actual end of the file. This is due to an assumption within the windowing code of sha1_file.c that we always have the packfile footer at the end of the file, so all windowing code assumes it has at least 20 bytes from the start of a window that it can access without needing to perform additional bounds checking. So what's happening here is we are adding objects into the file, some of which may have their data appearing in the trailing 20 bytes of some prior window. If that happens the cached window looks like it can be used to access the data, but it really cannot be in the NO_MMAP case as those trailing 20 bytes of the window are all \0's. The astute reader may wonder how gfi_unpack_entry() works when NO_MMAP is being used. It doesn't for that last 20 bytes of any window. Which has me thinking that Junio's patch alone isn't enough to make fast-import work correctly under NO_MMAP. > diff --git a/fast-import.c b/fast-import.c > index 3609c24..bd0ddb1 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -926,7 +926,13 @@ static void end_packfile(void) > new_p = add_packed_git(idx_name, strlen(idx_name), 1); > if (!new_p) > die("core git rejected index %s", idx_name); > - new_p->windows = old_p->windows; > + while (old_p->windows) { > + struct pack_window *w = old_p->windows; > + munmap(w->base, w->len); > + old_p->windows = w->next; > + free(w); > + } > + new_p->windows = NULL; This assignment of NULL should not be necessary. add_packed_git() already does this for us on new_p so we do not need to repeat it here in fast-import. > all_packs[pack_id] = new_p; > install_packed_git(new_p); > -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html