Hello Hugh, On Thu, Sep 22, 2016 at 03:36:36AM -0700, Hugh Dickins wrote: > I suppose: this one seems overblown to me, and risks more change > (as the CONFIG_DEBUG_VM_RB=y crashes showed). When DEBUG_VM_RB=n there was no bug that I know of. So I don't think the fact there was a false positive in the validation code that didn't immediately cope with the new changes, should be a major concern. > But I've come back to it several times, not found any incorrectness, > and was just about ready to Ack it (once the VM_RB fix is folded in, > though I've not studied that yet): when I noticed that what I'd liked > least about this one, looks unnecessary too - see below. The reason the VM_RB=y incremental fix to the validation code is not a few liner, is to micro-optimize it. I call directly __vma_unlink_common to be sure the additional parameter is eliminated at build time if CONFIG_DEBUG_VM_RB=n, and it never risks to go through the stack in production. > At the bottom I've appended my corrected version of Andrea's > earlier patches for comparison: maybe better for stable? I think it's perfectly suitable for -stable, if there is urgency to merge it in -stable. OTOH with regard to urgency, this isn't exploitable and the bug was there for 10+ years? > > +static inline void __vma_unlink_prev(struct mm_struct *mm, > > + struct vm_area_struct *vma, > > + struct vm_area_struct *prev) > > +{ > > + __vma_unlink_common(mm, vma, prev, true); > > +} > > + > > +static inline void __vma_unlink(struct mm_struct *mm, > > + struct vm_area_struct *vma) > > +{ > > + __vma_unlink_common(mm, vma, NULL, false); > > +} > > + > > Umm, how many functions do we need to unlink a vma? > Perhaps I'm missing some essential, but what's wrong with a single > __vma_unlink(mm, vma)? (Could omit mm, but probably better with it.) Of course that would work, I did that initially. I only had __vma_unlink and I just removed the "prev" parameter from it. > The existing __vma_unlink(mm, vma, prev) dates, of course, from > long before Linus added vma->vm_prev in 2.6.36. It doesn't really > need its prev arg nowadays, and I wonder if that misled you into > all this prev and has_prev stuff? After removing "prev" from __vma_unlink I reintroduced __vma_unlink_prev as a microoptimization for remove_next = 1/2 cases. In those two cases we have already "prev" and it's guaranteed not null. So by keeping the _common version __always_inline the parameters of the _common disappears in the assembly and in turn the __vma_unlink_prev is a bit faster. Perhaps it's not worth to do these kind of microoptimizations? The only reason I reintroduced a version of __vma_unlink_prev that gets prev not NULL as parameter was explicitly to microoptimize with __always_inline. > (Yes, of course it needs to handle the NULL vma->vm_prev mm->mmap > case, but that doesn't need these three functions.) > > But I see this area gets touched again in yesterday's 3/4 to fix > the VM_RB issue. I haven't tried applying that patch on top to > see what the result looks like, but I hope simpler than this. Right, to handle the case of DEBUG_VM_RB=y I need to pass a different "ignore" parameter in remove_next == 3, so it's even more worth to microoptimize now that I'm forced to have a different kind of call anyway, and I can't just call __vma_unlink(next). Once the two patches are folded, __vma_unlink is renamed to __vma_unlink_prev that is a more accurate name anyway I think, given the parameters and that assumption it does on prev being not NULL. > > + if (remove_next != 3) > > + __vma_unlink_prev(mm, next, vma); > > + else > > + /* vma is not before next if they've been swapped */ > > + __vma_unlink(mm, next); > > And if the VM_RB issue doesn't complicate it, this would just amount to > __vma_unlink(mm, next); > without any remove_next 3 variation. Yes, and VM_RB complicates it. > > + if (remove_next != 3) { > > if (vma == orig_vma), and you won't need the remove_next 3 state at all. I think that would be less readable. I don't want to risk to mistake case 1/2/3. I could use an enum and REMOVE_NEXT, REMOVE_NEXT_NEXT, REMOVE_PREV, or I could use -1 instead of 3 to show it's removing prev if you wish, but I would prefer not to use vma == orig_vma to detect remove_next != 3. It can't improve performance either. orig_vma is purely for trans_huge split when the vma->vm_start/end (and next->vm_start if adjust_next) boundary changes. The only point of orig_vma is to replace this statement: "remove_next != 3 : vma : next". I wouldn't mix up the detection of case 1/2/3 with that micro-optimization. > Here's my fixup of Andrea's earlier version, not swapping vma and next > as the above does, but applying properties of next to vma as before. > Maybe this version should go in first, so that it's available as an > easier and safer candidate for stable backports: whatever akpm prefers. I think this is a more conservative and in turn safer approach for urgent -stable or for urgent backports. Performance-wise I doubt any difference is measurable. For the longer term upstream, I think removing the oddness factor from case 8 for good is better, as we may get bitten by it again and it's also quite counter intuitive for the callers of vma_merge to receive a vma that isn't already fully in sync with all the parameters passed to vma_merge. And having to overwrite the "different" bits by hand. I feel the oddness in case 8 should be dropped for good and it's not much more complicated to do so (especially if we ignore the __vma_unlink details which are a fully self contained problem and they cannot add up to the complexity of vma_merge/vma_adjust). I successfully tested your fix with the testcase that exercises the race and reviewed your fix and it's certainly correct too to solve the race against rmap_walks that access vma_page_prot/vm_flags. The fix in -mm however is solving the race condition for all fields, if any rmap_walk accessed more than those two fields, and without having to copy them off. Tested-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> Overall I'm fine either ways, but I had to elaborate my preference :). Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>