Jim Meyering <jim@xxxxxxxxxxxx> writes: >>> Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> ... >> Independent of who creates a "invalid repository", our tools >> should be prepared to be fed bad data and not get upset. >> Somebody in our code read a non-commit (or it did not read >> anything) and told the object layer it is a commit. That >> codepath needs to be tightened up, no? > > Yes, of course. > Unfortunately, I won't have time to investigate for a while. Although I haven't seen the problematic repository, I think what is going on is that the repository has objects with pointers to other objects labelled with wrong types. For example, a tag can tag any type of object. $ git cat-file tag v1.5.4-rc1 object 74f6b03c5c8fceef416de9f9a18e5d64712b6d96 type commit tag v1.5.4-rc1 tagger Junio C Hamano <gitster@xxxxxxxxx> 1198114975 -0800 GIT 1.5.4-rc1 ... When parsing such a tag object, it instantiates an in-core tag object that has a pointer to an in-core commit object, because the tag says that it points at commit 74f6b0, without reading the tagged object itself. This is to allow the user of that in-core tag object to ask "what do you point at" and get an in-core object that represents the pointee. If (and only if) the user choose to follow that pointer and look at the contents of the pointee, it can then call parse_object() on that in-core object. In other words, the object layer is designed to work lazily. So if you earlier parsed such a tag but if the object 74f6b0 were an object of different type, we would hit the codepath you identified that leads to a NULL dereference. And "lying caller" cannot really do much better (except perhaps actually validating the presense and the type of the object, which would incur additional overhead). The situation is the same if: - a tree object records an entry with: - mode bits 040000 pointing at an object that is not a tree; or - mode bits 100(644|755) pointing at an object that is not a blob; or - mode bits 160000 pointing at an object that is not a commit; or - a commit object has a "parent" line that is not a commit; or - a commit object has a "tree" line that is not a tree. Also we cannot just say "then we would take safety over overhead" and make the lying callers validate the pointee; subproject commits pointed by gitlinks (i.e. entry with 160000 mode bits in a tree object) are not even required to be available. Your fix is all we can sensibly do. I however think you would need similar fix to the same function for other object types, as they dereference a potentially NULL pointer the same way. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html