Re: [PATCH v2 0/3] grep multithreading and scaling

Jeff King <peff@xxxxxxxx> · Mon, 5 Dec 2011 19:40:12 -0500

On Mon, Dec 05, 2011 at 10:38:16AM +0100, Thomas Rast wrote:

> I just found out that on Linux, there's mincore() that can tell us
> (racily, but who cares) whether a given file mapping is in memory.  If
> you would like to try it, see the source at the end, but I'm getting
> things such as

Neat, I didn't know about mincore.

> So that looks fairly promising, and the order would then be:
> 
> - if stat-clean, and we have mincore(), and it tells us we can do it
>   cheaply: grab file from tree
> 
> - if it's a loose object: decompress it
> 
> - if stat-clean: grab file from tree
> 
> - access packs as usual

I don't think your third one makes sense. If the working tree file isn't
stat clean, then either:

  1. the pack file is in cache, and it's way faster than faulting in the
     working tree file from disk

  2. the pack file is not in cache, and it's a toss-up whether it is
     faster to fault in the smaller compressed pack-file version and
     uncompress it, or to fault in the larger on-disk version. The
     exact result will depend on the ratio of CPU to disk speed, the
     quality of your filesystem, and the size and contents of your file.

     And possibly on the exact delta chains you have. Though this
     optimization only happens when the file is in the index, which
     usually means it's recent, which means it will tend to be at the
     head of the delta chain.

So it probably just makes sense to grab the working tree file only if
mincore() tells us we have all (or most) of it, and otherwise go to the
packfile.

> Ok, I see, I missed that part.  Perhaps the heuristic should then be
> "if the regex boils down to memmem, disable threading", but let's see
> what loose object decompression in parallel can give us.

Yeah. I'd really rather have parallel object decompression than some
complex Linux-only mincore optimization (even though that optimization
_could_ yield extra savings on top of properly threading, if the blob
retrieval is threaded, I think I'll care less about how much CPU time it
takes).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html