On Wed, Jul 31, 2019 at 11:09 PM <hpa@xxxxxxxxx> wrote: > > On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > >On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov <dima@xxxxxxxxxx> wrote: > >> > >> As it has been discussed on timens RFC, adding a new conditional > >branch > >> `if (inside_time_ns)` on VDSO for all processes is undesirable. > >> It will add a penalty for everybody as branch predictor may > >mispredict > >> the jump. Also there are instruction cache lines wasted on cmp/jmp. > > > > > >> > >> +#ifdef CONFIG_TIME_NS > >> +int vdso_join_timens(struct task_struct *task) > >> +{ > >> + struct mm_struct *mm = task->mm; > >> + struct vm_area_struct *vma; > >> + > >> + if (down_write_killable(&mm->mmap_sem)) > >> + return -EINTR; > >> + > >> + for (vma = mm->mmap; vma; vma = vma->vm_next) { > >> + unsigned long size = vma->vm_end - vma->vm_start; > >> + > >> + if (vma_is_special_mapping(vma, &vvar_mapping) || > >> + vma_is_special_mapping(vma, &vdso_mapping)) > >> + zap_page_range(vma, vma->vm_start, size); > >> + } > > > >This is, unfortunately, fundamentally buggy. If any thread is in the > >vDSO or has the vDSO on the stack (due to a signal, for example), this > >will crash it. I can think of three solutions: > > > >1. Say that you can't setns() if you have other mms and ignore the > >signal issue. Anything with green threads will disapprove. It's also > >rather gross. > > > >2. Make it so that you can flip the static branch safely. As in my > >other email, you'll need to deal with CoW somehow, > > > >3. Make it so that you can't change timens, or at least that you can't > >turn timens on or off, without execve() or fork(). > > > >BTW, that static branch probably needs to be aligned to a cache line > >or something similar to avoid all the nastiness with trying to poke > >text that might be concurrently executing. This will be a mess. > > Since we are talking about different physical addresses I believe we should be okay as long as they don't cross page boundaries, and even if they do it can be managed with proper page invalidation sequencing – it's not like the problems of having to deal with XMC on live pages like in the kernel. > > Still, you really need each instruction sequence to be present, with the only difference being specific patch sites. > > Any fundamental reason this can't be strictly data driven? Seems odd to me if it couldn't, but I might be missing something obvious. I think it can be. There are at least two places where vDSO slow paths could hook without affecting fast paths: vclock_mode and the low bit of the sequence number.