On Fri, Jul 9, 2021 at 4:49 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > Arnd Bergmann <arnd@xxxxxxxx> writes: > > On Fri, Jul 9, 2021 at 11:24 AM Huacai Chen <chenhuacai@xxxxxxxxx> wrote: > >> On Thu, Jul 8, 2021 at 9:30 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > Most such system calls currently go through set_user_sigmask or > > set_compat_user_sigmask, which only differ on big-endian. > > I would actually like to see these merged together and have a single > > helper checking for in_compat_syscall() to decide whether to do > > the word-swap for 32-bit bit-endian tasks or not, but that's a separate > > discussion (and I suspect that Eric won't like that version, based on > > other discussions we've had). > > Reading through get_compat_sigset is the best argument I have ever seen > for getting rid of big endian architectures. My gut reaction is we > should just sweep all of the big endian craziness into a corner and let > it disappear as the big endian architectures are retired. A nice thought, but not going to happen any time soon as long as IBM makes money from s390. > Perhaps we generalize the non-compat version of the system calls and > only have a compat version of the system call for the big endian > architectures. > > I really hope loongarch and any new architectures added to the tree all > are little endian. It is. Most of the architectures merged over the last years only support little-endian kernels, even those that can theoretically do both in hardware (c-sky, riscv). arm64 and arc support big-endian in theory, but this is rarely used. Most of the server and workstation class hardware from the last century is big-endian though. OpenRISC was the last architecture we support that is big-endian only, but this was designed 21 years ago now. > > What I think you need for loongarch though is to change > > set_user_sigmask(), get_compat_sigset() and similar functions to > > behave differently depending on the user space execution context, > > converting the 64-bit masks for loongarch/x86/arm64 tasks into > > 128-bit in-kernel masks, while copying the 128-bit mips masks > > as-is. This also requires changing the sigset_t and _NSIG > > definitions so you get a 64-bit mask in user space, but a 128-bit > > mask in kernel space. > > > > There are multiple ways of achieving this, either by generalizing > > the common code, or by providing an architecture specific > > implementation to replace it for loongarch only. I think you need to > > try out which of those is the most maintainable. > > I believe all of the modern versions of the system calls that > take a sigset_t in the kernel also take a sigsetsize. So the most > straight forward thing to do is to carefully define what happens > to sigsets that are too big or too small when set. > > Something like defining that if a sigset is larger than the kernel's > sigset size all of the additional bits must be zero, and if the sigset > is smaller than the kernel's sigset size all of the missing bits > will be set to zero in the kernel's sigset_t. There may be cases > I am missing bug for SIG_SETMASK, SIG_BLOCK, and SIG_UNBLOCK those > look like the correct definitions. Right, that would work as well. It is a change in behavior though, since currently kernels just reject any non-default sigsetsize, and there is a chance that this causes problems when some project relies on being able to pass an arbitrary sigsetsize value, and then someone tries running this on an older kernel. > Another option would be to simply have whatever translates the system > calls in userspace to perform the work of verifying the extra bits in > the bitmap are unused before calling system calls that take a sigset_t > and just ignoring the extra bits. This is why I asked about how the current loongarch code does it. qemu must already be doing something like this to run mips code on non-mips architectures or vice-versa. FEX is another project doing this, but they also want to add support for foreign syscalls (x86 on arm64 mainly). Based on previous discussions, I can see us coming up with a more general way to handle ioctl() commands using some lookup table to decide which commands need what kind of translation (32-bit compat, foreign architecture, usercopy, ...). This needs some more planning and discussion, but if we end up doing it, it would be plausible to have a more general way for dealing with more than two ABIs in a syscall. Arnd