Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > - explicit support for "missing objects". We don't do it right now, but > we could add it. It was discussed for things like limited history etc > (the "shallow clone" kind of thing, before people actually added > shallow clones), and it would support the notion of "we export all our > history, but for internal reasons we cannot make certain objects > available" kinds of workflows. > ... > But at least in theory, it wouldn't be impossible to extend on the > ".git/grafts" kind of setup to say "this object has been consciously > deleted", and that could in some circumstances be a better model. The > biggest headache there would be the need to extend the native git protocol > with a way to add such objects. While I agree in principle to the argument that there is no taking it back what's already published, I've heard people wanting to just stop distributing further, without worrying about copies already out there. 'missing objects' support would help us in such a situation. Supporting 'missing objects' in general would be painful, when they contain pointers to other objects (i.e. tags, commits, and trees). Thinking aloud... * missing blob: we can have 'stub blob' objects. Probably the object header for such an object would look like: stub <length> NUL ----------------- object <object name of the real blob object> type blob Hashing a 'stub' object (along with its header as usual, in write_sha1_file_prepare()) would instead just report the object name recorded there. When packing (this applies both to local repacking and push/fetch object transfer to other repositories), the stub object is included. delta algorithm would probably not to delta other objects with it. * missing commit and tag: 'stub object' needs to be extended to include these object types, and we would also need 'stub commit' and 'stub tag' objects, that copy the structural fields from the corresponding true object. So a stub commit would probably look like: stub <length> NUL ----------------- object <object name of the real commit object> type commit tree <object name of the tree contained in the real commit object> parent <object name of the first parent in the real commit object> parent <object name of the first second in the real commit object> * missing tree would only be useful to conceal pathnames recorded in the real tree object. I am not sure if that is needed. * fsck and verify-pack needs to be taught about 'stub' objects, so that they know that their filenames (or the data pointed at by pack .idx) do not match the result of hashing them. If we were to do this, I suspect we can probably do nothing but 'missing blob' first to cover a lot of ground, but we would eventually need 'missing commit' to replace real commit objects that has sensitive information in its log message. As Nico pointed out, this has serious security implications. We would need a separate list of objects that are Ok to be stubbed out, with probably explanation of why they are stubbed out, and fsck should compare the stub objects found in the repository against that list. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html