On Thu, May 30, 2024 at 08:27:53PM +0800, shejialuo wrote: > The existing git-fsck does not fully check bad format ref name such as > standalone '@' or name ending with ".lock". And also `git-fsck` does not > fully check ref content, the following contents will not be reported by > the original git-fsck command. > > 1. "c71dba5654404b9b0d378a6cbff1b743b9d31a34 garbage": with trailing > characters. > 2. "c71dba5654404b9b0d378a6cbff1b743b9d31a34 ": with trailing empty > characters. > 3. "C71DBA5654404b9b0d378a6cbff1b743b9d31A34": with uppercase hex. > 4. "z71dba5654404b9b0d378a6cbff1b743b9d31a34": with not hex character. > > Provide functionality to support such consistency checks for branch and > tag refs and add a new unit test to verify this functionality. I would recommend to only introduce one check per commit. This commit introduces multiple different checks to the files backend, which makes it harder to review. [snip] > +static int files_fsck_refs_content(const char *refs_check_dir, > + struct dir_iterator *iter) > +{ > + struct strbuf ref_content = STRBUF_INIT; > + > + if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) { > + error(_("%s/%s: unable to read the ref"), refs_check_dir, iter->basename); > + goto clean; > + } > + > + /* > + * Case 1: check if the ref content length is valid and the last > + * character is a newline. > + */ > + if (ref_content.len != the_hash_algo->hexsz + 1 || > + ref_content.buf[ref_content.len - 1] != '\n') { > + goto failure; > + } > + > + /* > + * Case 2: the content should be range of [0-9a-f]. > + */ > + for (size_t i = 0; i < the_hash_algo->hexsz; i++) { > + if (!isdigit(ref_content.buf[i]) && > + (ref_content.buf[i] < 'a' || ref_content.buf[i] > 'f')) { > + goto failure; > + } > + } > + > + strbuf_release(&ref_content); > + return 0; > + > +failure: > + error(_("%s/%s: invalid ref content"), refs_check_dir, iter->basename); > + > +clean: > + strbuf_release(&ref_content); > + return -1; Can we reuse `parse_loose_ref_contents()` to get rid of much of the duplication here? That function would need to learn to indicate to the caller that there is trailing data, but other than that it's already as strict as we want it to be. Other than that, Junio has already added quite a bunch of comments, so I'll refrain from repeating them here. Thanks! Patrick
Attachment:
signature.asc
Description: PGP signature