On Mon, Feb 03, 2025 at 09:40:22AM +0100, Patrick Steinhardt wrote: > On Thu, Jan 30, 2025 at 12:07:46PM +0800, shejialuo wrote: > > We have already implemented the header consistency check for the raw > > "packed-refs" file. Before we implement the consistency check for each > > ref entry, let's analysis [1] which reports that "git fsck" cannot > > detect some NUL characters. > > This paragraph doesn't quite parse. I think it can simply be left out, > as the remainder of the commit message already explains in more than > enough detail what you're doing. > Let me improve this in the next version. > > "packed-backend.c::next_record" will use "check_refname_format" to check > > the consistency of the refname. If it is not OK, the program will die. > > So, we already have the code path and we must miss out something. > > > > We use the following code to get the refname: > > > > strbuf_add(&iter->refname_buf, p, eol - p); > > iter->base.refname = iter->refname_buf.buf > > > > In the above code, `p` is the start pointer of the refname and `eol` is > > the next newline pointer. We calculate the length of the refname by > > subtracting the two pointers. Then we add the memory range between `p` > > and `eol` to get the refname. > > > > However, if there are some NUL characters in the memory range between `p` > > and `eol`, we will see the refname as a valid ref name as long as the > > memory range between `p` and first occurred NUL character is valid. > > > > In order to catch above corruption, create a new function > > "refname_contains_nul" by searching the first NUL character. If it is > > not at the end of the string, there must be some NUL characters in the > > refname. > > > > Use this function in "next_record" function to die the program if > > "refname_contains_nul" returns true. > > Yeah, makes sense to me. NUL bytes are invalid, and nothing good can > come out of it. > > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > > index 883189f3a1..870c8e7aaa 100644 > > --- a/refs/packed-backend.c > > +++ b/refs/packed-backend.c > > @@ -494,6 +494,22 @@ static void verify_buffer_safe(struct snapshot *snapshot) > > last_line, eof - last_line); > > } > > > > +/* > > + * When parsing the "packed-refs" file, we will parse it line by line. > > + * Because we know the start pointer of the refname and the next > > + * newline pointer, we could calculate the length of the refname by > > + * subtracting the two pointers. However, there is a corner case where > > + * the refname contains corrupted embedded NUL characters. And > > + * `check_refname_format()` will not catch this when the truncated > > + * refname is still a valid refname. To prevent this, we need to check > > + * whether the refname contains the NUL characters. > > + */ > > +static int refname_contains_nul(struct strbuf *refname) > > +{ > > + const char *pos = memchr(refname->buf, '\0', refname->len + 1); > > + return pos < refname->buf + refname->len; > > +} > > This can be simplified to: > > return !!memchr(refname->buf, '\0', refname->len); > This is very nice. > Ideally, we'd be amending `check_refname_format()` to do the checking > for us. But we can't without a wider refactoring because that function > gets a C string, and C strings are naturally terminadet by NUL > characters. > Yes, we cannot. Actually, this is a corner case. NUL character is so special. > I think that adding a new function for this is a bit over the top > though, as the check is unlikely to be useful in a lot of places and the > logic is rather trivial. So I'd just inline the check into > `next_record()`. > The reason why I extract this logic into a separate function is that we will reuse this logic for later packed backend consistency checking. We nearly use the same way to parse the raw "packed-ref" files. So, I don't want to repeat here. I will improve the commit message to add the motivation why we need to use a function instead of using it in the inline way. Thanks, Jialuo > Patrick