Hi, forgot to add Kirill to CC since this modifies the fault path he changed recently. I don't want to resend the whole series just because of this so at least I'm pinging him like this... Honza On Tue 01-11-16 23:36:06, Jan Kara wrote: > Hello, > > this is the fourth revision of my patches to clear dirty bits from radix tree > of DAX inodes when caches for corresponding pfns have been flushed. This patch > set is significantly larger than the previous version because I'm changing how > ->fault, ->page_mkwrite, and ->pfn_mkwrite handlers may choose to handle the > fault so that we don't have to leak details about DAX locking into the generic > code. In principle, these patches enable handlers to easily update PTEs and do > other work necessary to finish the fault without duplicating the functionality > present in the generic code. I'd be really like feedback from mm folks whether > such changes to fault handling code are fine or what they'd do differently. > > The patches are based on 4.9-rc1 + Ross' DAX PMD page fault series [1] + ext4 > conversion of DAX IO patch to the iomap infrastructure [2]. For testing, > I've pushed out a tree including all these patches and further DAX fixes > to: > > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git dax > > The patches pass testing with xfstests on ext4 and xfs on my end. I'd be > grateful for review so that we can push these patches for the next merge > window. > > [1] http://www.spinics.net/lists/linux-mm/msg115247.html > [2] Posted an hour ago - look for "ext4: Convert ext4 DAX IO to iomap framework" > > Changes since v3: > * rebased on top of 4.9-rc1 + DAX PMD fault series + ext4 iomap conversion > * reordered some of the patches > * killed ->virtual_address field in vm_fault structure as requested by > Christoph > > Changes since v2: > * rebased on top of 4.8-rc8 - this involved dealing with new fault_env > structure > * changed calling convention for fault helpers > > Changes since v1: > * make sure all PTE updates happen under radix tree entry lock to protect > against races between faults & write-protecting code > * remove information about DAX locking from mm/memory.c > * smaller updates based on Ross' feedback > > ---- > Background information regarding the motivation: > > Currently we never clear dirty bits in the radix tree of a DAX inode. Thus > fsync(2) flushes all the dirty pfns again and again. This patches implement > clearing of the dirty tag in the radix tree so that we issue flush only when > needed. > > The difficulty with clearing the dirty tag is that we have to protect against > a concurrent page fault setting the dirty tag and writing new data into the > page. So we need a lock serializing page fault and clearing of the dirty tag > and write-protecting PTEs (so that we get another pagefault when pfn is written > to again and we have to set the dirty tag again). > > The effect of the patch set is easily visible: > > Writing 1 GB of data via mmap, then fsync twice. > > Before this patch set both fsyncs take ~205 ms on my test machine, after the > patch set the first fsync takes ~283 ms (the additional cost of walking PTEs, > clearing dirty bits etc. is very noticeable), the second fsync takes below > 1 us. > > As a bonus, these patches make filesystem freezing for DAX filesystems > reliable because mappings are now properly writeprotected while freezing the > fs. > Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html