On 01/22/2018 08:31 PM, Junio C Hamano wrote: > Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: > >> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist >> (see the earlier code path in `load_contents()`). So either that code >> path *also* has to get the `xmalloc()` treatment, or my third patch is >> still necessary. (My second patch wouldn't be necessary because the >> ENOENT case makes `load_contents()` return 0, triggering the early exit >> from `create_snapshot()`.) >> >> I don't have a strong preference either way. > > Which would be a two-liner, like the attached, which does not look > too bad by itself. > > The direction, if we take this approach, means that we are declaring > that .buf being NULL is an invalid state for a snapshot to be in, > instead of saying "an empty snapshot looks exactly like one that was > freshly initialized", which seems to be the intention of the original > design. > > After Kim's fix and with 3/3 in your follow-up series, various > helpers are still unsafe against .buf being NULL, like > sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only > when mmapped bit is set), find_reference_location(). > > packed_ref_iterator_begin() checks if snapshot->buf is NULL and > returns early. At the first glance, this appears a useful short cut > to optimize the empty case away, but the check also is acting as a > guard to prevent a snapshot with NULL .buf from being fed to an > unsafe find_reference_location(). An implicit guard like this feels > a bit more brittle than my liking. If we ensure .buf is never NULL, > that check can become a pure short-cut optimization and stop being > a correctness thing. > > So... > > > refs/packed-backend.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index b6e2bc3c1d..1eeb5c7f80 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot) > if (fd < 0) { > if (errno == ENOENT) { > /* > - * This is OK; it just means that no > - * "packed-refs" file has been written yet, > - * which is equivalent to it being empty, > - * which is its state when initialized with > - * zeros. > + * Treat missing "packed-refs" as equivalent to > + * it being empty. > */ > + snapshot->eof = snapshot->buf = xmalloc(0); > + snapshot->mmapped = 0; > return 0; > } else { > die_errno("couldn't read %s", snapshot->refs->path); > That would work, though if you go this way, please also change the docstring for `snapshot::buf`, which still says that `buf` and `eof` can be `NULL`. The other alternative, making `snapshot` safe for NULLs, becomes easier if `snapshot` stores a pointer to the start of the reference section of the `packed-refs` contents (i.e., after the header line), rather than repeatedly computing that address from `snapshot->buf + snapshot->header_len`. With this change, code that is technically undefined when the fields are NULL can more easily be replaced with code that is safe for NULL. For example, pos = snapshot->buf + snapshot->header_len becomes pos = snapshot->start , and len = snapshot->eof - pos; if (!len) [...] becomes if (pos == snapshot->eof) [...] len = snapshot->eof - pos; . In this way, most of the special-casing for NULL goes away (and some code becomes simpler, as well). In a moment I'll send a patch series illustrating this approach. I think patches 01, 02, and 04 are improvements regardless of whether we decide to make NULL safe. The change to using `read()` rather than `mmap()` for small `packed-refs` feels like it should be an improvement, but it occurred to me that the performance numbers quoted in ea68b0ce9f8 (hash-object: don't use mmap() for small files, 2010-02-21) are not directly applicable to the `packed-refs` file. As far as I understand, the file mmapped in `index_fd()` is always read in full, whereas the main point of mmapping the packed-refs file is to avoid having to read the whole file at all in some situations. That being said, a 32 KiB file would only be 8 pages (assuming a page size of 4 KiB), and by the time you've read the header and binary-searched to find the desired record, you've probably paged in most of the file anyway. Reading the whole file at once, in order, is almost certainly cheaper. Michael