On Thu, Jan 30, 2025 at 12:07:46PM +0800, shejialuo wrote: > We have already implemented the header consistency check for the raw > "packed-refs" file. Before we implement the consistency check for each > ref entry, let's analysis [1] which reports that "git fsck" cannot > detect some NUL characters. This paragraph doesn't quite parse. I think it can simply be left out, as the remainder of the commit message already explains in more than enough detail what you're doing. > "packed-backend.c::next_record" will use "check_refname_format" to check > the consistency of the refname. If it is not OK, the program will die. > So, we already have the code path and we must miss out something. > > We use the following code to get the refname: > > strbuf_add(&iter->refname_buf, p, eol - p); > iter->base.refname = iter->refname_buf.buf > > In the above code, `p` is the start pointer of the refname and `eol` is > the next newline pointer. We calculate the length of the refname by > subtracting the two pointers. Then we add the memory range between `p` > and `eol` to get the refname. > > However, if there are some NUL characters in the memory range between `p` > and `eol`, we will see the refname as a valid ref name as long as the > memory range between `p` and first occurred NUL character is valid. > > In order to catch above corruption, create a new function > "refname_contains_nul" by searching the first NUL character. If it is > not at the end of the string, there must be some NUL characters in the > refname. > > Use this function in "next_record" function to die the program if > "refname_contains_nul" returns true. Yeah, makes sense to me. NUL bytes are invalid, and nothing good can come out of it. > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index 883189f3a1..870c8e7aaa 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -494,6 +494,22 @@ static void verify_buffer_safe(struct snapshot *snapshot) > last_line, eof - last_line); > } > > +/* > + * When parsing the "packed-refs" file, we will parse it line by line. > + * Because we know the start pointer of the refname and the next > + * newline pointer, we could calculate the length of the refname by > + * subtracting the two pointers. However, there is a corner case where > + * the refname contains corrupted embedded NUL characters. And > + * `check_refname_format()` will not catch this when the truncated > + * refname is still a valid refname. To prevent this, we need to check > + * whether the refname contains the NUL characters. > + */ > +static int refname_contains_nul(struct strbuf *refname) > +{ > + const char *pos = memchr(refname->buf, '\0', refname->len + 1); > + return pos < refname->buf + refname->len; > +} This can be simplified to: return !!memchr(refname->buf, '\0', refname->len); Ideally, we'd be amending `check_refname_format()` to do the checking for us. But we can't without a wider refactoring because that function gets a C string, and C strings are naturally terminadet by NUL characters. I think that adding a new function for this is a bit over the top though, as the check is unlikely to be useful in a lot of places and the logic is rather trivial. So I'd just inline the check into `next_record()`. Patrick