On 09/11/2014 04:28 PM, Thomas Gleixner wrote: > On Thu, 11 Sep 2014, Qiaowei Ren wrote: >> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl() >> commands. These commands can be used to register and unregister MPX >> related resource on the x86 platform. > > I cant see anything which is registered/unregistered. This registers the location of the bounds directory with the kernel. >From the app's perspective, it says "I'm using MPX, and here is where I put the root data structure". Without this, the kernel would have to do an (expensive) xsave operation every time it wanted to see if MPX was in use. This also makes the user/kernel interaction more explicit. We would be in a world of hurt if userspace was allowed to move the bounds directory around. With this interface, it's a bit more obvious that userspace can't just move it around willy-nilly. >> The base of the bounds directory is set into mm_struct during >> PR_MPX_REGISTER command execution. This member can be used to >> check whether one application is mpx enabled. > > This changelog is completely useless. Yeah, it's pretty bare-bones. Let me know if the explanation above makes sense, and we'll get it updated. >> +/* >> + * This should only be called when cpuid has been checked >> + * and we are sure that MPX is available. > > Groan. Why can't you put that cpuid check into that function right > away instead of adding a worthless comment? Sounds reasonable to me. We should just move the cpuid check in to task_get_bounds_dir(). >> + */ >> +static __user void *task_get_bounds_dir(struct task_struct *tsk) >> +{ >> + struct xsave_struct *xsave_buf; >> + >> + fpu_xsave(&tsk->thread.fpu); >> + xsave_buf = &(tsk->thread.fpu.state->xsave); >> + if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG)) >> + return NULL; > > Now this might be understandable with a proper comment. Right now it's > a magic check for something uncomprehensible. It's a bit ugly to access, but it seems pretty blatantly obvious that this is a check for "Is the enable flag in a hardware register set?" Yes, the registers have names only a mother could love. But that is what they're really called. I guess we could add some comments about why we need to do the xsave. >> +int mpx_register(struct task_struct *tsk) >> +{ >> + struct mm_struct *mm = tsk->mm; >> + >> + if (!cpu_has_mpx) >> + return -EINVAL; >> + >> + /* >> + * runtime in the userspace will be responsible for allocation of >> + * the bounds directory. Then, it will save the base of the bounds >> + * directory into XSAVE/XRSTOR Save Area and enable MPX through >> + * XRSTOR instruction. >> + * >> + * fpu_xsave() is expected to be very expensive. In order to do >> + * performance optimization, here we get the base of the bounds >> + * directory and then save it into mm_struct to be used in future. >> + */ > > Ah. Now we get some information what this might do. But that does not > make any sense at all. > > So all it does is: > > tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK; > > or: > > tsk->mm.bd_addr = NULL; > > So we use that information to check, whether we need to tear down a > VM_MPX flagged region with mpx_unmap(), right? Well, we use it to figure out whether we _potentially_ need to tear down an VM_MPX-flagged area. There's no guarantee that there will be one. >> + /* >> + * Check whether this vma comes from MPX-enabled application. >> + * If so, release this vma related bound tables. >> + */ >> + if (mm->bd_addr && !(vma->vm_flags & VM_MPX)) >> + mpx_unmap(mm, start, end); > > You really must be kidding. The application maps that table and never > calls that prctl so do_unmap() will happily ignore it? Yes. The only other way the kernel can possibly know that it needs to go tearing things down is with a potentially frequent and expensive xsave. Either we change mmap to say "this mmap() is for a bounds directory", or we have some other interface that says "the mmap() for the bounds directory is at $foo". We could also record the bounds directory the first time that we catch userspace using it. I'd rather have an explicit interface than an implicit one like that, though I don't feel that strongly about it. > The design to support this feature makes no sense at all to me. We > have a special mmap interface, some magic kernel side mapping > functionality and then on top of it a prctl telling the kernel to > ignore/respect it. That's a good point. We don't seem to have anything in the allocate_bt() side of things to tell the kernel to refuse to create things if the prctl() hasn't been called. That needs to get added. > All I have seen so far is the hint to read some intel feature > documentation, but no coherent explanation how this patch set makes > use of that very feature. The last patch in the series does not count > as coherent explanation. It merily documents parts of the > implementation details which are required to make use of it but > completely lacks of a coherent description how all of this is supposed > to work. It sounds like we need to take the patch00 plus the documentation patch and try to lay things out more clearly. > Despite the fact that this is V8, I can't suppress the feeling that > this is just cobbled together to make it work somehow and we'll deal > with the fallout later. It's v8, but it's been very lightly reviewed. I do appreciate the review at this point, though. > I wouldn't be surprised if some of the fallout > is going to be security related. I have a pretty good idea how to > exploit it even without understanding the non-malicious intent of the > whole thing. If you don't want to share them in public, I'm happy to take this off-list, but please do share. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>