Hi folks, As discussed in the recent thread about problems with DAX locking: http://www.gossamer-threads.com/lists/linux/kernel/2264090?do=post_view_threaded I said that I'd post the patch set that fixed the problems for XFS as soon as I had something sane and workable. That's what this series is. To start with, it passes xfstests "auto" group with only the only failures being expected failures or failures due to unexpected allocation patterns or trying to use unsupported block sizes. That makes it better than any previous version of the XFS/DAX code. The patchset starts by reverting the two patches that were introduced in 4.3-rc1 to try to fix the fault vs fault and fault vs truncate races that caused deadlocks. This fixes the hangs in generic/075 that these patches introduced. Patch 3 enables XFS to handle the behaviour of DAX and DIO when asking to allocate the block at (2^63 - 1FSB), where the offset + count s technically illegal (larger than sb->s_maxbytes) and overflows a s64 variable. This is currently hidden by the fact that all DAX and DIO allocation is currently unwritten, but patch 5 exposes it for DAX. Patch 4 introduces the ability for XFS to allocate physically zeroed data blocks. This is done for each physical extent that is allocated, deep inside the allocator itself and guaranteed to be atomic with the allocation transaction and hence has no crash+recovery exposure issues. This is necessary because the BMAPI layer merges allocated extents in the BMBT before it returns the mapped extent back to the high level get_blocks() code. Hence the high level code can have a single extent presented that is made of merged new and existing extents, and so zeroing can't be done at this layer. The advantage of driving the zeroing deep into the allocator is the functionality is now available to all XFS code. Hence we can allocate pre-zeroed blocks on any type of storage, and we can utilise storage-based hardware acceleration (e.g. discard to zero, WRITE_SAME, etc) to do the zeroing. From this POV, DAX is just another hardware accelerated physical zeroing mechanism for XFS. :) [ This is an example of the mantra I repeat a lot: solve the problem properly the first time and it will make everything simpler! Sure, it took me three attempts to work out how to solve it in a sane manner, but that's pretty much par for the course with anything non-trivial. ] Patch 5 makes __xfs_get_blocks() aware that it is being called from the DAX fault path and makes sure it returns zeroed blocks rather than unwritten extents via XFS_BMAPI_ZERO. It also now sets XFS_BMAPI_CONVERT, which tells it to convert unwritten extents to written, zeroed blocks. This is the major change of behaviour. Patch 6 removes the IO completion callbacks from the XFS DAX code as they are not longer necessary after patch 5. Patch 7 adds pfn_mkwrite support to XFS. This is needed to fix generic/080, which detects a failure to update the inode timestamp on a pfn fault. It also adds the same locking as the XFS implementation of ->fault and ->page_mkwrite and hence provide correct serialisation against truncate, hole punching, etc that doesn't currently exist. The next steps that are needed are to do the same "block zeroing during allocation" to ext4, and then the block zeroing and complete_unwritten callbacks can be removed from the DAX API and code. I've had a breif look at the ext4 code - the block zeroing should be able to be done by overloading the existing zeroout code that ext4 has in the unwritten extent allocation code. I'd much prefer that an ext4 expert does this work, and then we can clean up the DAX code... Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html