On Wed, May 25, 2022 at 12:48:54AM -0700, Jonathan Nieder wrote: > >> What does mtimes_map contain? A comment would help. > > > > It contains a pointer at the beginning of the mmapped region of the > > .mtimes file, similar to revindex_map above it. > > To be clear, in cases like this by "comment" I mean "in-code comment". > I.e., my interest is not that _I_ find out the answer but that the > code becomes more maintainable via the answer becoming easier to find. OK. I'll add a comment in the fixup! patch which I'm about to send. > [...] > >> This seems simple enough that it's not obvious we need more code > >> sharing. Do you agree? If so, I'd suggest just removing the > >> NEEDSWORK comment. > > > > Yeah, it is conceptually simple, though it feels like the sort of thing > > that could benefit from not having to be written once for each > > extension (hence the comment). > > The reason I asked is that the NEEDSWORK here actually got in the way > of comprehension for me --- it made me wonder "is there some > complexity here I'm missing?" > > That's why I'd suggest one of > - removing the NEEDSWORK comment > - going ahead and implementing the code sharing you mean, or > - fleshing out the NEEDSWORK comment so the reader can wonder less I am a little sad to remove it, since I thought it was useful as-is. But I can just as easily remember to come back to this myself in the future, so if it is distracting to you in the meantime, then I don't mind holding onto it in my own head. > >>> + > >>> +#define MTIMES_HEADER_SIZE (12) > >>> +#define MTIMES_MIN_SIZE (MTIMES_HEADER_SIZE + (2 * the_hash_algo->rawsz)) > >> > >> Hm, the all-caps name makes this feel like a compile-time constant but > >> it contains a reference to the_hash_algo. Could it be an inline > >> function instead? > > > > Yes, it could be an inline function, but I don't think there is > > necessarily anything wrong with it being a #define'd macro. There are > > some other examples, e.g., RIDX_MIN_SIZE, MIDX_MIN_SIZE, > > GRAPH_DATA_WIDTH, and PACK_SIZE_THRESHOLD (to name a few) which also use > > the_hash_algo on the right-hand side of a `#define`. > > Those are due to an incomplete migration from use of the true constant > GIT_SHA1_RAWSZ to use of the dynamic value the_hash_algo->rawsz, no? > In other words, "other examples do it wrong" doesn't feel like a great > justification for making it worse in new code. Fair point. I can imagine reasons for the existing pattern, but updating it to handle the variable rawsz is easy to do (and it probably should have been that way since the beginning). > [...] > >>> +static int load_pack_mtimes_file(char *mtimes_file, > >>> + uint32_t num_objects, > >>> + const uint32_t **data_p, size_t *len_p) > >> > >> What does this function do? A comment would help. > > > > I know that I'm biased as the author of this code, but I think the > > signature is clear here. At least, I'm not sure what information a > > comment would add that the function name and its arguments don't already > > convey. > > Ah, thanks for this point of clarification. What isn't clear from the > signature is > - when should I call this function? > - what does its return value represent? > - how does it handle errors? > > I agree that the parameters are self-explanatory. I'm hesitant to over-document a static function with a single caller, but when looking at this, I think there is an opportunity to document _its_ caller (`load_pack_mtimes()`) which isn't static, but was also missing documentation. > >>> +cleanup: > >>> + if (ret) { > >>> + if (data) > >>> + munmap(data, mtimes_size); > >>> + } else { > >>> + *len_p = mtimes_size; > >>> + *data_p = (const uint32_t *)data; > >> > >> Do we know that 'data' is uint32_t aligned? Casting earlier in the > >> function could make that more obvious. > > > > `data` is definitely uint32_t aligned, but this is a tradeoff, since if > > we wrote: > > > > uint32_t *data = xmmap(...); > > > > then I think we would have to change the case where ret is non-zero to be: > > > > if (data) > > munmap((void*)data, ...); > > > > and likewise, data_p is const. > > Doing it that way sounds great to me. That way, the type contains the > information we need up-front and the safety of the cast is obvious in > the place where the cast is needed. > > (Although my understanding is also that in C it's fine to pass a > uint32_t* to a function expecting a void*, so the second cast would > also not be needed.) > > [...] Done, thanks for the suggestion. Thanks, Taylor