On Mon, Apr 11, 2022 at 11:41 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > Also, if the st_dev and st_rdev values are 32-bit, we don't have to use > old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW > on filesystems that are on NVMe because NVMe uses the major number 259. The problem with this part of the patch is that this: > @@ -353,7 +352,7 @@ static int cp_new_stat(struct kstat *sta > #endif > > INIT_STRUCT_STAT_PADDING(tmp); > - tmp.st_dev = encode_dev(stat->dev); > + tmp.st_dev = new_encode_dev(stat->dev); completely changes the format of that st_dev field. For completely insane historical reasons, we have had the rule that - 32-bit architectures encode the device into a 16 bit value - 64-bit architectures encode the device number into a 32 bit value and that has been true *despite* the fact that the actual "st_dev" field has been 32-bit and 64-bit respectively since 2003! And it doesn't help that to confuse things even more, the _naming_ of those "encode_dev" functions is "old and new", so that logically you'd think that "cp_new_stat()" would use "new_encode_dev()". Nope. So on 32-bit architectures, cp_new_stat() uses "old_encode_dev()", which historically put the minor number in bits 0..7, and the major number in bits 8..15. End result: on a 32-bit system (or in the compat syscall mode), changing to new_encode_dev() would confuse anybody (like just "ls -l /dev") that uses that old stat call and tries to print out major/minor numbers. Now,. the good news is that (a) nobody should use that old stat call, since the new world order is called "stat64" and has been for a loooong time - also since at least 2003) (b) we could just hide the bits in upper bits instead. So what I suggest we do is to make old_encode_dev() put the minor bits in bits 0..7 _and_ 16..23, and the major bits in 8..15 _and_ 24..32. And then the -EOVERFLOW should be something like unsigned int st_dev = encode_dev(stat->dev); tmp.st_dev = st_dev; if (st_dev != tmp.st_dev) return -EOVERFLOW; for the lcase that tmp.st_dev is actually 16-bit (ie the compat case for some architecture where the padding wasn't there?) NOTE: That will still screw up 'ls -l' output, but only for the devices that previously would have returned -EOVERFLOW. And it will make anybopdy who does that "stat1->st_dev == stat2->st_dev && ino == ino2" thing for testing "same inode" work just fine. Linus