On Mon, 11 Apr 2022, Linus Torvalds wrote: > On Mon, Apr 11, 2022 at 4:43 AM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > > If you run a program compiled with OpenWatcom for Linux on a filesystem on > > NVMe, all "stat" syscalls fail with -EOVERFLOW. The reason is that the > > NVMe driver allocates a device with the major number 259 and it doesn't > > pass the "old_valid_dev" test. > > OpenWatcom? Really? Yes. I use OpenWatcom to verify that my programs are clean ANSI C without any gccisms. Other than that, it is not much useful - it has it's own libc, it's own module format, and programs compiled with OpenWatcom cannot be linked with existing *.a or *.so libraries. > > This patch removes the tests - it's better to wrap around than to return > > an error. (note that cp_old_stat also doesn't report an error and wraps > > the number around) > > Hmm. We've used majors over 256 for a long time, but some of them are > admittedly very rare (SCSI OSD?) > > Unfortunate. And in this case 259 aliases to 3, which is the old > HD/IDE0 major number. That's not great - there would be other numbers > that didn't have that problem (ie 4-6 are all currently only character > device majors, I think). Should we perhaps hash the number, take 16 bits of the hash and hope than the collision won't happen? > Anyway, I think that check is just bogus. The cp_new_stat() thing uses > 'struct stat' and it has > > unsigned long st_dev; /* Device. */ > unsigned long st_rdev; /* Device number, if device. */ > > so there's no reason to limit things to the old 8-bit behavior. > > Yes, it does that > > #define valid_dev(x) choose_32_64(old_valid_dev(x),true) > #define encode_dev(x) choose_32_64(old_encode_dev,new_encode_dev)(x) > > static __always_inline u16 old_encode_dev(dev_t dev) > { > return (MAJOR(dev) << 8) | MINOR(dev); > } > > which currently drops bits, but we should just *fix* that. We can put > the high bits in the upper bits, not limit it to 16 bits when we have > more space than that. Yes - we can return values larger than 16-bit here. But there's a risk that the userspace code will extract the values using macros like this and lose the upper bits: #define major(device) ((int)(((device) >> 8) & 0xFF)) #define minor(device) ((int)((device) & 0xff)) > Even the *really* old 'struct old_stat' doesn't really have a 16-bit > st_dev/rdev. > > Linus For me, the failure happens in cp_compat_stat (I have a 64-bit kernel). In struct compat_stat in arch/x86/include/asm/compat.h, st_dev and st_rdev are compat_dev_t which is 16-bit. But they are followed by 16-bit paddings, so they could be extended. If you have a native 32-bit kernel, it uses 'struct stat' defined at the beginning of arch/x86/include/uapi/asm/stat.h that has 32-bit st_dev and st_rdev. If you use a 64-bit kernel with 32-bit compat, it uses 'struct compat_stat' defined in arch/x86/include/asm/compat.h that has 16-bit st_dev and st_rdev. That's an inconsistency that should be resolved. What did glibc do? Did it use 16-bit dev_t with following padding or 32-bit dev_t? (the current glibc just uses stat64 and 64-bit dev_t always) Mikulas