On Thu, Jan 16, 2025 at 02:57:40PM +0100, Patrick Steinhardt wrote: > On Sun, Jan 05, 2025 at 09:49:51PM +0800, shejialuo wrote: > > We have already implemented the header consistency check for the raw > > "packed-refs" file. Before we implement the consistency check for each > > ref entry, let's analysis [1] which reports that "git fsck" cannot > > detect some binary zeros. > > > > "packed-backend.c::next_record" will use "check_refname_format" to check > > the consistency of the refname. If it is not OK, the program will die. > > So, we already have the code path and we must miss out something. > > > > We use the following code to get the refname: > > > > strbuf_add(&iter->refname_buf, p, eol - p); > > iter->base.refname = iter->refname_buf.buf > > > > In the above code, `p` is the start pointer of the refname and `eol` is > > the next newline pointer. We calculate the length of the refname by > > subtracting the two pointers. Then we add the memory range between `p` > > and `eol` to get the refname. > > > > However, if there are some NULL binaries in the memory range between `p` > > You probably mean NUL characters, not NULL binaries? > Yes, I will improve this in the next version. > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > > index 3b11abe5f8..f6142a4402 100644 > > --- a/refs/packed-backend.c > > +++ b/refs/packed-backend.c > > @@ -493,6 +493,23 @@ static void verify_buffer_safe(struct snapshot *snapshot) > > last_line, eof - last_line); > > } > > > > +/* > > + * When parsing the "packed-refs" file, we will parse it line by line. > > + * Because we know the start pointer of the refname and the next > > + * newline pointer, we could calculate the length of the refname by > > + * subtracting the two pointers. However, there is a corner case where > > + * the refname contains corrupted embedded NULL binaries. And > > + * `check_refname_format()` will not catch this when the truncated > > + * refname is still a valid refname. To prevent this, we need to check > > + * whether the refname contains the NULL binaries. > > + */ > > +static int refname_contains_null(struct strbuf refname) > > +{ > > + if (refname.len != strlen(refname.buf)) > > + return 1; > > + return 0; > > +} > > + > > #define SMALL_FILE_SIZE (32*1024) > > > > /* > > @@ -894,6 +911,9 @@ static int next_record(struct packed_ref_iterator *iter) > > strbuf_add(&iter->refname_buf, p, eol - p); > > iter->base.refname = iter->refname_buf.buf; > > > > + if (refname_contains_null(iter->refname_buf)) > > We can replace this with `memchr(iter->refname_buf.buf, '\0', > iter->refname_buf.len)`, which should be more efficient than using > strlen(3p). Thanks for the suggestion. Will improve this in the next version. > > > + die("packed refname contains embedded NULL: %s", iter->base.refname); > > + > > I was a bit surprised to find that we modify the way that we read refs > from the packed-refs file instead of adapting the fsck code. But I think > this check is sensible. Actually, I am also surprised here. And this thing is extremely interesting. When I implement all the fsck code, I find I still cannot detect the error reported in [1] which is the motivation why we want to add checks for ref explicitly. And I dive into the code to fix this problem. The reason why I put here is that we are going to implement the checks like what "next_record" does. [1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@xxxxxxx/ Thanks, Jialuo