On Fri, 7 Feb 2025 17:24:42 +0000 Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > According to the syzbot report referenced here, it is possible to encounter > a race between mprotect() writing to the vma->vm_flags field and migration > checking whether the VMA is locked. > > There is no real problem with timing here per se, only that torn > reads/writes may occur. Therefore, as a proximate fix, ensure both > operations READ_ONCE() and WRITE_ONCE() to avoid this. > > This race is possible due to the ability to look up VMAs via the rmap, > which migration does in this case, which takes no mmap or VMA lock and > therefore does not preclude an operation to modify a VMA. > > When the final update of VMA flags is performed by mprotect, this will > cause the rmap lock to be taken while the VMA is inserted on split/merge. > > However the means by which we perform splits/merges in the kernel is that > we perform the split/merge operation on the VMA, acquiring/releasing locks > as needed, and only then, after having done so, modifying fields. > > We should carefully examine and determine whether we can combine the two > operations so as to avoid such races, and whether it might be possible to > otherwise annotate these rmap field accesses. Thanks. If some poor person reads this code and wonders "why is it using READ_ONCE", what's our answer? I guess it's "poke around with git-blame". And I guess we can live with that - it doesn't seem practical to paste changelog text into every READ_ONCE() site. Probably most people won't bother and READ_ONCEs of ->vm_flags will get pasted into other places where unneeded. I do wonder if we can do better.