Re: [PATCH] csum-file: flush less often

Jeff King <peff@xxxxxxxx> · Thu, 25 Mar 2021 23:16:53 -0400

On Thu, Mar 25, 2021 at 11:52:29AM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> >> So, I'm of two minds here:
> >>
> >>  1. This is embarassing. I wasted everyone's time for nothing. I can retract
> >>     this patch.
> >>
> >>  2. This is embarassing. I overstated the problem here. But we might be able
> >>     to eke out a tiny performance boost here.
> >>
> >> I'm open to either. I think we should default to dropping this patch unless
> >> someone thinks the rewrite above is a better organization of the logic. (I
> >> can then send a v2 including that version and an updated commit message.)
> >
> > 3. The current code around "if (nr == sizeof(f->buffer))" might be a
> >    bit too clever for readers who try to understand what is going
> >    on, and the whole "while" loop may deserve a comment based on
> >    what you wrote before your replacement implementation.

Yes, my first thought on reading Stolee's post-image was: wait, how do
we know when data needed flushed from the buffer? But that is not new in
his patch. It is confusing before and after. :)

> Having said all that, comparing the original and the version updated
> with your "flush less often" patch, I find the latter quite easier
> to read, so as long as the update does not give us 1% slowdown, it
> may be worth adopting for the readability improvement alone.
> 
> Of course, if we were to go that route, the sales pitch in the log
> message needs to be updated.

Yeah, I am OK with either version, as long as it is justified correctly
in the commit message. IMHO the big difference is that the original is
using local data/offset variables in order to provide a layer of
indirection when we get to the hash+flush code. And Stolee's patch is
calling the same code in the two places instead.

It's quite possible that gives the compiler slightly more opportunity to
micro-optimize (which doesn't matter if you are feeding big blocks, but
may if you are feeding 4 bytes at a time as in the midx code; though in
that case it is entirely possible that the caller allocating a single
array, writing it, and then feeding it to hashwrite() would be faster
still, though a little more cumbersome).

-Peff