CCing Avi, On Mon, Jun 21, 2010 at 04:09:39PM +0200, Andrea Arcangeli wrote: > On Mon, Jun 21, 2010 at 04:22:38PM +0300, Gleb Natapov wrote: > > On Mon, Jun 21, 2010 at 03:16:08PM +0200, Andrea Arcangeli wrote: > > > > KOSAKI Motohiro get_user_pages vs COW problem > > > > > > Just a side note, not sure exactly what is meant to be discussed about > > > this bug, considering the fact this is still unsolved isn't technical > > > problem as there were plenty of fixes available, and the one that seem > > > to had better chance to get included was the worst one in my view, as > > > it tried to fix it in a couple of gup caller (but failed, also because > > > finding all put_page pin release is kind of a pain as they're spread > > > all over the place and not identified as gup_put_page, and in addition > > > to the instability and lack of completeness of the fix, it was also > > > the most inefficient as it added unnecessary and coarse locking) plus > > > all gup callers are affected, not just a few. I normally call it gup > > > vs fork race. Luckily not all threaded apps uses O_DIRECT and fork and > > > pretend to do the direct-io in different sub-page chunks of the same > > > page from different threads (KVM would probably be affected if it > > > didn't use MADV_DONTFORK on the O_DIRECT memory, as it might run fork > > > to execute some network script when adding an hotplug pci net device > > > for example). But surely we can discuss the fix we prefer for this > > > bug, or at least we can agree it needs fixing. > > > > > KVM is actually affected by the bug. The fix was posted today: > > http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg36759.html > > Interesting... so this is the page returned by gup that doesn't match > anymore the page after an user write into qemu context after > fork. Clearly any of the fixes proposed would have prevented this bug > in the first place as they would assign a copy to the child, so yes > it's likely this same bug. It's quite sad to have this workload that > is superfluous if gup would behave as supposed by the caller. Also I'd > prefer if you would use MADV_DONTFORK for the fix, as that will at > least optimize fork and it would still be ok to keep even after we fix > the VM while this workaround of using tmpfs should be backed out. Avi did the fix. We discussed using MADV_DONTFORK for that, but calling madvise() from kernel deemed to be messy. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html