On Thu, 13 Nov 2008, James Pickens wrote: > > I wonder if there are other completely different parts of git that could > benefit from multi threading when the work tree is on nfs? I'm sure there are. That said, threading things is usually really quite painful. The only reason this preloading was easy to do was that we really had all the data structures laid out beautifully for this, and I had spent a lot of effort earlier on a whole series of "avoid duplicate lstat()" changes, which gave us that whole ce_uptodate() thing, and all normal cases already taking advantage of it, and the "uptodate" bit being percolated along all the paths. If it hadn't been for that, it would have been much nastier to do. As it was, there was literally just a simple little extra phase to fill in all teh data structures that we already had set up in parallel. > I'm thinking specifically of 'git checkout', since while testing this > patch I happened to do a 'git pull' that resulted in several thousand > new files being created, and the "Checking out files" part took > *forever* to run. Now, the good news is that the actual work-tree part of checking things out is probably pretty amenable to the same kind of parallelization, for largely the same reasons: the whole checking out thing is already done in multiple phases with all error handling done before-hand. So we will have built up all our data structures earlier, and set the CE_UPDATE bit, and then there's just a final "push it all out" phase. So CE_UPTODATE and CE_UPDATE are really very similar in that sense - except at opposite ends of the pipeline. The CE_UPTODATE bit marks a name entry as matching the filesystem data (and allows all later phases to avoid doing the expensive lstat()s), while the CE_UPDATE (and CE_REMOVE) bits allow us to do all our complex work in-memory without committing it to disk, and then we push it out in one go. So if you want to multi-thread checkout, you literally need to just thread the last for-loop in unpack-trees.c:check_updates() (the CE_UPDATE loop that does "checkout_entry()" over the whole index). > And FWIW, I timed 50 iterations of 'git diff', and the average runtime > dropped from 11.7s to 2.8s after this patch. A nice improvement. Very impressive. That said, I suspect you get a "superlinear" improvement because once it gets faster, the kernel cache also works better, since you can do more loops without having the NFS attributes time out. Whether that kind of effect happens much in actual practice is debatable, although it's quite possible that it will work the same way in some scripting schenarios. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html