Hi, Taylor Blau wrote: > On Tue, May 24, 2022 at 12:32:01PM -0700, Jonathan Nieder wrote: >> Makes sense. Does this intend to produce any functional change? I'm >> guessing not (and the lack of tests agrees), but the commit message >> doesn't say so. [...] > This does not produce a functional change, no. This commit in isolation > adds a bunch of dead code that will be used (and tested) in the > following patches. [...] >> What does mtimes_map contain? A comment would help. > > It contains a pointer at the beginning of the mmapped region of the > .mtimes file, similar to revindex_map above it. To be clear, in cases like this by "comment" I mean "in-code comment". I.e., my interest is not that _I_ find out the answer but that the code becomes more maintainable via the answer becoming easier to find. [...] >> This seems simple enough that it's not obvious we need more code >> sharing. Do you agree? If so, I'd suggest just removing the >> NEEDSWORK comment. > > Yeah, it is conceptually simple, though it feels like the sort of thing > that could benefit from not having to be written once for each > extension (hence the comment). The reason I asked is that the NEEDSWORK here actually got in the way of comprehension for me --- it made me wonder "is there some complexity here I'm missing?" That's why I'd suggest one of - removing the NEEDSWORK comment - going ahead and implementing the code sharing you mean, or - fleshing out the NEEDSWORK comment so the reader can wonder less >>> + >>> +#define MTIMES_HEADER_SIZE (12) >>> +#define MTIMES_MIN_SIZE (MTIMES_HEADER_SIZE + (2 * the_hash_algo->rawsz)) >> >> Hm, the all-caps name makes this feel like a compile-time constant but >> it contains a reference to the_hash_algo. Could it be an inline >> function instead? > > Yes, it could be an inline function, but I don't think there is > necessarily anything wrong with it being a #define'd macro. There are > some other examples, e.g., RIDX_MIN_SIZE, MIDX_MIN_SIZE, > GRAPH_DATA_WIDTH, and PACK_SIZE_THRESHOLD (to name a few) which also use > the_hash_algo on the right-hand side of a `#define`. Those are due to an incomplete migration from use of the true constant GIT_SHA1_RAWSZ to use of the dynamic value the_hash_algo->rawsz, no? In other words, "other examples do it wrong" doesn't feel like a great justification for making it worse in new code. [...] >>> +static int load_pack_mtimes_file(char *mtimes_file, >>> + uint32_t num_objects, >>> + const uint32_t **data_p, size_t *len_p) >> >> What does this function do? A comment would help. > > I know that I'm biased as the author of this code, but I think the > signature is clear here. At least, I'm not sure what information a > comment would add that the function name and its arguments don't already > convey. Ah, thanks for this point of clarification. What isn't clear from the signature is - when should I call this function? - what does its return value represent? - how does it handle errors? I agree that the parameters are self-explanatory. >>> +cleanup: >>> + if (ret) { >>> + if (data) >>> + munmap(data, mtimes_size); >>> + } else { >>> + *len_p = mtimes_size; >>> + *data_p = (const uint32_t *)data; >> >> Do we know that 'data' is uint32_t aligned? Casting earlier in the >> function could make that more obvious. > > `data` is definitely uint32_t aligned, but this is a tradeoff, since if > we wrote: > > uint32_t *data = xmmap(...); > > then I think we would have to change the case where ret is non-zero to be: > > if (data) > munmap((void*)data, ...); > > and likewise, data_p is const. Doing it that way sounds great to me. That way, the type contains the information we need up-front and the safety of the cast is obvious in the place where the cast is needed. (Although my understanding is also that in C it's fine to pass a uint32_t* to a function expecting a void*, so the second cast would also not be needed.) [...] >>> +int load_pack_mtimes(struct packed_git *p) >> >> This could use a doc comment in the header file. For example, what >> requirements do we have on what the caller passes as 'p'? >> >> [...] >>> +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos) >> >> Likewise. > > Sure. I wonder when we should do that, though. I'm not trying to be > impatient to get this merged, but iterating on the documentation feels > like it could be done on top without having to re-send the substantive > parts of this series over and over. In terms of re-sending patches, sending a "fixup!" patch with the minor changes you want to make doesn't seem too problematic to me. In general a major benefit of code review is getting others' eyes on new code from the standpoint of readability and maintainability; including comments like this up front doesn't seem like a huge amount to ask (versus getting those comments to be perfect, which would be unreasonable to expect since it's not hard to update them over time). > Thanks, > Taylor Thanks for looking it through. Sincerely, Jonathan