On Thu, Aug 15, 2024 at 09:37:44PM +0800, shejialuo wrote: > On Thu, Aug 15, 2024 at 03:19:50AM -0700, karthik nayak wrote: > > shejialuo <shejialuo@xxxxxxxxx> writes: > > > > > Hi All: > > > > > > We have already set up the infrastructure of the ref consistency. > > > However, we have only add ref name check when establishing the > > > infrastructure in below: > > > > > > https://lore.kernel.org/git/ZrSqMmD-quQ18a9F@ArchLinux.localdomain/ > > > > > > Actually, we already have a patch here which has already implemented the > > > ref content consistency check. But during the review process, we have > > > encountered some problems. The intention of this RFC is to make sure > > > what content we should check and also to what extend. > > > > > > I conclude the following info: > > > > > > 1. For the regular ref which has a trailing garbage, we should warn the > > > user. This is the most simplest situation, we could reply on > > > "parse_loose_ref_content" to do this. > > > 2. For the symref, we could also rely on "parse_loose_ref_content" to > > > get the "pointee", and check the location of the "pointee", check the > > > name of the "pointee" and the file type of the "pointee". > > > 3. FOr the symbolic ref, we could follow the idea of 2. > > > > > > > Just to understand clearly, when you're talking about 'symbolic ref' you > > are referring to symbolic links? > > > > I am sorry about this. It's symbolic links here. Wait, is it really symbolic link? I don't think so, you actually were talking about symbolic refs correctly. The fact that symbolic refs have been implemented as a symbolic link in the past (and still can be used for that purpose) is rather an implementation detail. But the overall context, and what we actually want to check on disk, is a symbolic ref in its modern incarnation. And checking the format of both normal and symbolic refs does make sense in my opinion. > > > But Patrick gives a question here: > > > > > >> In case the ref ends with a newline, should we check that the next > > >> character is `\0`? Otherwise, it may contain multiple lines, which is > > >> not allowed for a normal ref. > > >> > > >> Also, shouldn't the ref always end with a newline? > > > > > > For symref, I guess we have no spec here. From my experiments, a symref > > > could have a newline or no newline, even multiple newlines. And also > > > symref could have multiple spaces. But the following is a bad symref > > > > > > ref: refs/heads/main garbage > > > > > > I think we should fully discuss what we should check here. Thus I will > > > implement the code. > > > > > > > Agreed, in refs/files-backend.c:create_symref_lock, we write symrefs as > > "ref: %s\n" so it makes sense to validate that there is nothing extra. > > Yes, we should do this. I will implement the code and the send the > patches to the mailing list. Agreed. We have to exclude pseudorefs (FETCH_HEAD, MERGE_HEAD, see gitglossary(7)) from these checks, as those _are_ allowed to contain extra data. But no other reference should carry more data than that. Namely, a regular ref should always be "hex * hash_len + \n", while a symbolic ref should always be "ref: $valid_refname\n". A ref that does not conform to this is not a properly formatted reference and thus worth being warned about. Patrick