Re: [PATCH v2] csum-file: make hashwrite() more readable

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 26 Mar 2021 14:38:03 -0700

"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

>  		if (nr == sizeof(f->buffer)) {
> -			/* process full buffer directly without copy */
> -			data = buf;
> +			/*
> +			 * Flush a full batch worth of data directly
> +			 * from the input, skipping the memcpy() to
> +			 * the hashfile's buffer. In this block,
> +			 * f->offset is necessarily zero.
> +			 */

What made me a bit confused was the fact that, in order to exercise
the "bypass memcpy and take a full bufferful from the incoming data
directly" optimization, there are two preconditions.  The incoming
data must be large enough, and we do not have anything kept in the
buffer that needs to be emitted before the incoming data.  And the
cleverness of the original code was that both are checked by this
single "nr == sizeof(f->buffer)" condition.

So I do appreciate this extra comment, and I think future readers of
the code will, too.

> +			the_hash_algo->update_fn(&f->ctx, buf, nr);
> +			flush(f, buf, nr);
>  		} else {
> -			memcpy(f->buffer + offset, buf, nr);
> -			data = f->buffer;
> +			/*
> +			 * Copy to the hashfile's buffer, flushing only
> +			 * if it became full.
> +			 */
> +			memcpy(f->buffer + f->offset, buf, nr);
> +			f->offset += nr;
> +			left -= nr;
> +			if (!left)
> +				hashflush(f);
>  		}
>  
>  		count -= nr;
> -		offset += nr;
>  		buf = (char *) buf + nr;
> -		left -= nr;
> -		if (!left) {
> -			the_hash_algo->update_fn(&f->ctx, data, offset);
> -			flush(f, data, offset);
> -			offset = 0;
> -		}
> -		f->offset = offset;
>  	}
>  }
>  
>
> base-commit: 142430338477d9d1bb25be66267225fb58498d92