On Mon, Nov 13, 2017 at 3:51 AM, Vincent Chen <deanbo422@xxxxxxxxx> wrote: >>>On Wed, Nov 8, 2017 at 6:55 AM, Greentime Hu <green.hu@xxxxxxxxx> wrote: >>> From: Greentime Hu <greentime@xxxxxxxxxxxxx> > >>> +#define __ARCH_WANT_SYS_CLONE >> >>This seems ok, though it would be nice to have the reverse logic and have architectures opt-out of the generic version when they need to provide their own, rather than having most architectures set it. >> > > Thanks > I will provide nds32 SYSCALL_DEFINE_5(clone) in the next version patch. That's not what I meant, my suggestion is to create a new patch to remove the __ARCH_WANT_SYS_CLONE symbol from all architectures, and instead use something like __ARCH_HAVE_SYS_CLONE on architectures that do *not* want the generic implementation. nds32 clearly wants the generic implementation here, it just shouldn't have to select that symbol to get it. >>> +/* >>> + * Special system call wrappers >>> + * >>> + * $r0 = syscall number >>> + * $r8 = syscall table >>> + */ >>> + .type sys_syscall, #function >>> +ENTRY(sys_syscall) >>> + addi $p1, $r0, #-__NR_syscalls >>> + bgtz $p1, 3f >>> + move $p1, $r0 >>> + move $r0, $r1 >>> + move $r1, $r2 >>> + move $r2, $r3 >>> + move $r3, $r4 >>> + move $r4, $r5 >>> +! add for syscall 6 args >>> + lwi $r5, [$sp + #SP_OFFSET ] >>> + lwi $r5, [$r5] >>> +! ~add for syscall 6 args >>> + >>> + lw $p1, [tbl+$p1<<2] >>> .+ jr $p1 >>> +3: b sys_ni_syscall >>> +ENDPROC(sys_syscall) >> >>Can you explain what this is used for? >> > > This is used to handle syscall(int number, ....). > Unlike other architectures, the system number shall be determined in > compile time when issuing system call in nds32. > Therefore, we only can parse the content of syscall(int number, ....) > and distribute it to destination handler in kernel space > (Other architecture can handle it in user space by glibc's syscall wrapper) Hmm, I think other architectures that run into this problem use self-modifying code for syscall(), but that is also ugly as it requires having a page that is both writable and executable. I think your approach can be tricky for things like seccomp(). It's possible that you get all of it right, but if you can come up with a different solution, that might be better. How much would it cost to simply always pass the syscall number in a register, and not use the immediate argument at all? >>> --- /dev/null >>> +++ b/arch/nds32/kernel/sys_nds32.c >>> + >>> +long sys_mmap2(unsigned long addr, unsigned long len, >>> + unsigned long prot, unsigned long flags, >>> + unsigned long fd, unsigned long pgoff) { >>> + if (pgoff & (~PAGE_MASK >> 12)) >>> + return -EINVAL; >>> + >>> + return sys_mmap_pgoff(addr, len, prot, flags, fd, >>> + pgoff >> (PAGE_SHIFT - 12)); } >>> + >>> +asmlinkage long sys_fadvise64_64_wrapper(int fd, int advice, loff_t offset, >>> + loff_t len) { >>> + return sys_fadvise64_64(fd, offset, len, advice); } >> >>You should always use SYSCALL_DEFINE*() macros to define entry points for your own syscalls in C code for consistency. I also wonder if we should just move those two into common code, a lot of architectures need the first one in particular. >> > > The sys_fadvise64_64_wrapper is used to reorder the input parameter. > > In order to solve register alignment problem, we adjust the input > parameter order of fadvise64_64 while issuing this syscall. > Therefore, we need this wrapper to reorder the input parameter to fit > sys_fadvise64_64's API in kernel. I understand what it's used for, it's just that I would recommend writing it differently, as SYSCALL_DEFINE4(fadvise64_64_wrapper, int, fd, int, advice, loff_t, offset, loff_t, len) { return sys_fadvise64_64(fd, offset, len, advice); } Arnd