On Wed, Aug 10, 2016 at 1:06 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > ----- On Aug 10, 2016, at 3:16 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote: > >> On Wed, Aug 10, 2016 at 12:04 PM, Mathieu Desnoyers >> <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >>> ----- On Aug 10, 2016, at 4:10 AM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote: >>> >>>> On Tue, Aug 9, 2016 at 1:06 PM, Mathieu Desnoyers >>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >>> >>> <snip> >>> >>>>> Actually, we want copy_from_user() there. This executes upon >>>>> resume to user-space, so we can take a page fault is needed, so >>>>> no "inatomic" needed. I therefore suggest: >>>> >>>> Running the code below via exit_to_usermode_loop... >>>> >>>>> >>>>> static bool rseq_get_rseq_cs(struct task_struct *t, >>>>> void __user **start_ip, >>>>> void __user **post_commit_ip, >>>>> void __user **abort_ip) >>>>> { >>>>> unsigned long ptr; >>>>> struct rseq_cs __user *urseq_cs; >>>>> struct rseq_cs rseq_cs; >>>>> >>>>> if (__get_user(ptr, &t->rseq->rseq_cs)) >>>>> return false; >>>>> if (!ptr) >>>>> return true; >>>>> #ifdef CONFIG_COMPAT >>>>> if (in_compat_syscall()) { >>>>> urseq_cs = compat_ptr((compat_uptr_t)ptr); >>>>> if (copy_from_user(&rseq_cs, urseq_cs, sizeof(*rseq_cs))) >>>>> return false; >>>>> *start_ip = compat_ptr((compat_uptr_t)rseq_cs.start_ip); >>>>> *post_commit_ip = compat_ptr((compat_uptr_t)rseq_cs.post_commit_ip); >>>>> *abort_ip = compat_ptr((compat_uptr_t)rseq_cs.abort_ip); >>>>> return true; >>>>> } >>>>> #endif >>>> >>>> ...means that in_compat_syscall() is nonsense. (It *works* there, but >>>> I can't imagine that it does anything that is actually sensible for >>>> this use.) >>> >>> Agreed that we are not per-se in a system call here. It works for >>> in_ia32_syscall(), but it may not work for in_x32_syscall(). >>> >>> Then should we test for this ? >>> >>> if (!is_64bit_mm(current->mm)) >>> >>> This is currently x86-specific. Is this how we are expected to test >>> the user-space pointer size in the current mm in arch-agnostic code ? >>> If so, we should implement is_64bit_mm() on all other architectures. >> >> There is no universal concept of the user-space pointer size on x86 >> because x86 code can change it via long jumps. >> >> What are you actually trying to do? I would guess that >> user_64bit_mode(regs) is the right thing here, because the rseq data >> structure is describing the currently executing code. > > Yes, that's correct, we care about the pointer size of currently executing > code. On x86 user_64bit_mode(regs) would appear to be the right thing to do. > >> >>> >>>> >>>> Can't you just define the ABI so that no compat junk is needed? >>>> (Also, CRIU will thank you for doing that.) >>> >>> We are dealing with user-space pointers here, so AFAIU we need to >>> be aware of their size, which involves compat code. Am I missing >>> something ? >> >> u64 is a perfectly valid, if odd, userspace pointer on all >> architecures that I know of, and it's certainly a valid userspace >> pointer on x86 32-bit userspace (the high bits will just all be zero). >> Can you just use u64? > > My concern is about a 32-bit user-space putting garbage rather than zeroes > (on purpose) to fool the kernel on those upper 32 bits. Doing > > compat_ptr((compat_uptr_t)rseq_cs.start_ip) > > effectively ends up clearing the upper 32 bits. > > But since we only use those pointer values for comparisons, perhaps we > just don't care if a 32-bit userspace app try to shoot itself in > the foot by passing garbage upper 32 bits ? > How is garbage in the high bits any different than garbage in any other bits in there? > >> If this would be a performance problem on ARM, then maybe that's a >> reason to use compat helpers. > > We already use 64-bit values for the pointers, even on 32-bit. Normally > userspace just puts zeroes in the top bits. It's mostly a question of > clearing the top 32 bits or not when loading them in the kernel. If we > don't need to, then I can remove the compat code entirely, and we don't > care about user_64bit_mode() anymore, as you initially recommended. > Does it make sense ? Yes, I think so. I'd suggest just honoring all the bits. > >> >>> >>>> >>>> >>>>>>> +SYSCALL_DEFINE2(rseq, struct rseq __user *, rseq, int, flags) >>>>>>> +{ >>>>>>> + if (unlikely(flags)) >>>>>>> + return -EINVAL; >>>>>> >>>>>> (add whitespace) >>>>> >>>>> fixed. >>>>> >>>>>> >>>>>>> + if (!rseq) { >>>>>>> + if (!current->rseq) >>>>>>> + return -ENOENT; >>>>>>> + return 0; >>>>>>> + } >>>> >>>> This looks entirely wrong. Setting rseq to NULL fails if it's already >>>> NULL but silently does nothing if rseq is already set? Surely it >>>> should always succeed and it should actually do something if rseq is >>>> set. >>> >>> From the proposed rseq(2) manpage: >>> >>> "A NULL rseq value can be used to check whether rseq is registered >>> for the current thread." >>> >>> The implementation does just that: it returns -1, errno=ENOENT if no >>> rseq is currently registered, or 0 if rseq is currently registered. >> >> I think that's problematic. Why can't you unregister an existing >> rseq? If you can't, how is a thread supposed to clean up after >> itself? >> > > Unregistering an existing thread rseq would require that we keep reference > counting, in case multiple libs and/or the app are using rseq. I am > trying to keep things as simple as needed. > > If I understand your concern, the problematic scenario would be at > thread exit (this is my current approximate understanding of glibc > handling of library TLS variable reclaim at thread exit): > > thread exits in userspace: > - glibc frees its rseq TLS memory area (in case the TLS is in a library), > - thread preempted before really exiting, > - kernel reads/writes to freed TLS memory. > - corruption may occur (e.g. memory re-allocated by another thread already) > > Am I getting it right ? Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html