Re: [PATCH v5 02/17] pack-mtimes: support reading .mtimes files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Taylor Blau wrote:
> On Tue, May 24, 2022 at 12:32:01PM -0700, Jonathan Nieder wrote:

>> Makes sense.  Does this intend to produce any functional change?  I'm
>> guessing not (and the lack of tests agrees), but the commit message
>> doesn't say so.
[...]
> This does not produce a functional change, no. This commit in isolation
> adds a bunch of dead code that will be used (and tested) in the
> following patches.
[...]
>> What does mtimes_map contain?  A comment would help.
>
> It contains a pointer at the beginning of the mmapped region of the
> .mtimes file, similar to revindex_map above it.

To be clear, in cases like this by "comment" I mean "in-code comment".
I.e., my interest is not that _I_ find out the answer but that the
code becomes more maintainable via the answer becoming easier to find.

[...]
>> This seems simple enough that it's not obvious we need more code
>> sharing.  Do you agree?  If so, I'd suggest just removing the
>> NEEDSWORK comment.
>
> Yeah, it is conceptually simple, though it feels like the sort of thing
> that could benefit from not having to be written once for each
> extension (hence the comment).

The reason I asked is that the NEEDSWORK here actually got in the way
of comprehension for me --- it made me wonder "is there some
complexity here I'm missing?"

That's why I'd suggest one of
- removing the NEEDSWORK comment
- going ahead and implementing the code sharing you mean, or
- fleshing out the NEEDSWORK comment so the reader can wonder less

>>> +
>>> +#define MTIMES_HEADER_SIZE (12)
>>> +#define MTIMES_MIN_SIZE (MTIMES_HEADER_SIZE + (2 * the_hash_algo->rawsz))
>>
>> Hm, the all-caps name makes this feel like a compile-time constant but
>> it contains a reference to the_hash_algo.  Could it be an inline
>> function instead?
>
> Yes, it could be an inline function, but I don't think there is
> necessarily anything wrong with it being a #define'd macro. There are
> some other examples, e.g., RIDX_MIN_SIZE, MIDX_MIN_SIZE,
> GRAPH_DATA_WIDTH, and PACK_SIZE_THRESHOLD (to name a few) which also use
> the_hash_algo on the right-hand side of a `#define`.

Those are due to an incomplete migration from use of the true constant
GIT_SHA1_RAWSZ to use of the dynamic value the_hash_algo->rawsz, no?
In other words, "other examples do it wrong" doesn't feel like a great
justification for making it worse in new code.

[...]
>>> +static int load_pack_mtimes_file(char *mtimes_file,
>>> +				 uint32_t num_objects,
>>> +				 const uint32_t **data_p, size_t *len_p)
>>
>> What does this function do?  A comment would help.
>
> I know that I'm biased as the author of this code, but I think the
> signature is clear here. At least, I'm not sure what information a
> comment would add that the function name and its arguments don't already
> convey.

Ah, thanks for this point of clarification.  What isn't clear from the
signature is
- when should I call this function?
- what does its return value represent?
- how does it handle errors?

I agree that the parameters are self-explanatory.

>>> +cleanup:
>>> +	if (ret) {
>>> +		if (data)
>>> +			munmap(data, mtimes_size);
>>> +	} else {
>>> +		*len_p = mtimes_size;
>>> +		*data_p = (const uint32_t *)data;
>>
>> Do we know that 'data' is uint32_t aligned?  Casting earlier in the
>> function could make that more obvious.
>
> `data` is definitely uint32_t aligned, but this is a tradeoff, since if
> we wrote:
>
>     uint32_t *data = xmmap(...);
>
> then I think we would have to change the case where ret is non-zero to be:
>
>     if (data)
>         munmap((void*)data, ...);
>
> and likewise, data_p is const.

Doing it that way sounds great to me.  That way, the type contains the
information we need up-front and the safety of the cast is obvious in
the place where the cast is needed.

(Although my understanding is also that in C it's fine to pass a
uint32_t* to a function expecting a void*, so the second cast would
also not be needed.)

[...]
>>> +int load_pack_mtimes(struct packed_git *p)
>>
>> This could use a doc comment in the header file.  For example, what
>> requirements do we have on what the caller passes as 'p'?
>>
>> [...]
>>> +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos)
>>
>> Likewise.
>
> Sure. I wonder when we should do that, though. I'm not trying to be
> impatient to get this merged, but iterating on the documentation feels
> like it could be done on top without having to re-send the substantive
> parts of this series over and over.

In terms of re-sending patches, sending a "fixup!" patch with the
minor changes you want to make doesn't seem too problematic to me.  In
general a major benefit of code review is getting others' eyes on new
code from the standpoint of readability and maintainability; including
comments like this up front doesn't seem like a huge amount to ask
(versus getting those comments to be perfect, which would be
unreasonable to expect since it's not hard to update them over time).

> Thanks,
> Taylor

Thanks for looking it through.

Sincerely,
Jonathan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux