On Fri, Nov 19, 2010 at 2:54 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, 18 Nov 2010 23:23:16 -0800 > Michel Lespinasse <walken@xxxxxxxxxx> wrote: > >> On Thu, Nov 18, 2010 at 09:41:22AM -0800, Hugh Dickins wrote: >> > On Thu, 18 Nov 2010, Christoph Hellwig wrote: >> > > I think it would help if we could drink a bit of the test driven design >> > > coolaid here. Michel, can you write some testcases where pages on a >> > > shared mapping are mlocked, then dirtied and then munlocked, and then >> > > written out using msync/fsync. Anything that fails this test on >> > > btrfs/ext4/gfs/xfs/etc obviously doesn't work. >> > Whilst it's hard to argue against a request for testing, Dave's worries >> > just sprang from a misunderstanding of all the talk about "avoiding -> >> > page_mkwrite". There's nothing strange or risky about Michel's patch, >> > it does not avoid ->page_mkwrite when there is a write: it just stops >> > pretending that there was a write when locking down the shared area. >> >> So, I decided to test this using memtoy. > > Wait. You *tested* the kernel? > > I dunno, kids these days... Not guilty - I mean, Christoph made me do it ! > Dirtying all that memory at mlock() time is pretty obnoxious. > > I'm inclined to agree that your patch implements the desirable > behaviour: don't dirty the page, don't do block allocation. Take a > fault at first-dirtying and do it then. This does degrade mlock a bit: > the user will find that the first touch of an mlocked page can cause > synchronous physical I/O, which isn't mlocky behaviour *at all*. But > we have to be able to do this anyway - whenever the kupdate function > writes back the dirty pages it has to mark them read-only again so the > kernel knows when they get redirtied. Glad to see that we seem to be coming to an agreement here. > So all that leaves me thinking that we merge your patches as-is. Then > work out why users can fairly trivially use mlock to hang the kernel on > ext2 and ext3 (and others?) I would say the hang is not even mlock related - you see without it also. All you need is mmap a large file with holes and write fault pages until you run out of disk space. At that point additional write faults wait for a writeback that can never complete. Sysadmin can however kill -9 such processes and/or free some space, though. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href