Am 22.01.23 um 08:48 schrieb Jeff King: > On Sat, Jan 21, 2023 at 10:36:08AM +0100, René Scharfe wrote: > >> Am 19.01.23 um 02:39 schrieb Jeff King: >>> >>> Though I do find the use of strlen() in decode_tree_entry() >>> a little suspicious (and that would be true of the current code, as >>> well, since it powers hash-object's existing parsing check). >> >> strlen() won't overrun the buffer because the first check in >> decode_tree_entry() makes sure there is a NUL in the buffer ahead. >> If get_mode() crosses it then we exit early. > > Yeah, that was what I found after digging deeper (see my patch 7). > >> Storing the result in an unsigned int can overflow on platforms where >> size_t is bigger. That would result in pathlen values being too short >> for really long paths, but no out-of-bounds access. They are then >> stored as signed int in struct name_entry and used as such in many >> places -- that seems like a bad idea, but I didn't actually check them >> thoroughly. > > Yeah, I agree that the use of a signed int there looks questionable. I > do think it's orthogonal to my series here, as that tree-decoding is > used by the existing hash-object checks. Sure. > But it probably bears further examination, especially because we use it > for the fsck checks on incoming objects via receive-pack, etc, which are > meant to be the first line of defense for hosters who might receive > malicious garbage from users. > > We probably ought to reject trees with enormous names via fsck anyway. I > actually have a patch to do that, but of course it depends on > decode_tree_entry() to get the length, so there's a bit of > chicken-and-egg. Solvable by limiting the search for the end of the string in decode_tree_entry() by using strnlen(3) or memchr(3) instead of strlen(3). You just need to define some (configurable?) limit. > We probably also should outright reject gigantic trees, > which closes out a whole class of integer truncation problems. I know > GitHub has rejected trees over 100MB for years for this reason. Makes sense. >> get_mode() can overflow "mode" if there are too many octal digits. Do >> we need to accept more than two handfuls in the first place? I'll send >> a patch for at least rejecting overflow. > > Seems reasonable. I doubt there's an interesting attack here, just > because the mode isn't used to index anything. If you feed a garbage > mode that happens to overflow to something useful, you could just as > easily have sent the useful mode in the first place. > >> Hmm, what would be the performance impact of trees with mode fields >> zero-padded to silly lengths? > > Certainly it would waste some time parsing the tree, but you could do > that already with a long pathname. Or just having a lot of paths in a > tree. Or a lot of trees. That's a cup half full/empty thing, perhaps. What's the harm in leading zeros? vs. Why allow leading zeros? René