Re: [PATCH v3 7/8] packed-backend: check whether the "packed-refs" is sorted

shejialuo <shejialuo@xxxxxxxxx> · Wed, 12 Feb 2025 18:56:49 +0800

On Wed, Feb 12, 2025 at 10:56:56AM +0100, Patrick Steinhardt wrote:
> >  static int packed_fsck_ref_content(struct fsck_options *o,
> >  				   struct ref_store *ref_store,
> >  				   const char *start, const char *eof)
> >  {
> >  	struct strbuf packed_entry = STRBUF_INIT;
> > +	struct fsck_packed_ref_entry **entries;
> >  	struct strbuf refname = STRBUF_INIT;
> >  	unsigned long line_number = 1;
> > +	unsigned int sorted = 0;
> > +	size_t entry_alloc = 20;
> > +	size_t entry_nr = 0;
> >  	const char *eol;
> >  	int ret = 0;
> >  
> >  	strbuf_addf(&packed_entry, "packed-refs line %lu", line_number);
> >  	ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
> >  	if (*start == '#') {
> > -		ret |= packed_fsck_ref_header(o, start, eol);
> > +		ret |= packed_fsck_ref_header(o, start, eol, &sorted);
> >  
> >  		start = eol + 1;
> >  		line_number++;
> >  	}
> >  
> > +	ALLOC_ARRAY(entries, entry_alloc);
> >  	while (start < eof) {
> > +		struct fsck_packed_ref_entry *entry
> > +			= create_fsck_packed_ref_entry(line_number, start);
> 
> Instead of slurping in all entries and allocating them in an array, can
> we instead remember the last one and just compare that the last record
> is smaller than the current record?
> 

Sorry here, I have missed out this. Actually, the way you say is the
most efficient way to check whether the "packed-refs" is sorted.
However, there is a concern. When we check each ref entry, we could
compare the refname with previous refname. But I don't want to do this
due to the reason that I don't want to mix up these two checks. To
conclude, we have the following call sequences which are independent.

1. check ref entry consistency. (oid, refnames, format...)
2. check whether the "packed-refs" is sorted.

But I do agree with your concern. The reason why I record them is that I
think we have already parsed the file, I think there is no need to parse
it again. So, I use a way to record the information needed to check. And
this would definitely introduce memory burden.

So we have two choices:

1. Keep the design unchanged (space overhead).
2. Parse the file again (time overhead). Thus we only have two allocated
memory.