On Wed, Apr 09, 2014 at 11:43:31PM +0200, Jan Kara wrote: > So there are three places that can fail after we allocate the block: > 1) We race with truncate reducing i_size > 2) dax_get_pfn() fails > 3) vm_insert_mixed() fails > > I would guess that 2) can fail only if the HW has problems and leaking > block in that case could be acceptable (please correct me if I'm wrong). > 3) shouldn't fail because of ENOMEM because fault has already allocated all > the page tables and EBUSY should be handled as well. So the only failure we > have to care about is 1). And we could move ->get_block() call under > i_mmap_mutex after the i_size check. Lock ordering should be fine because > i_mmap_mutex ranks above page lock under which we do block mapping in > standard ->page_mkwrite callbacks. The only (big) drawback is that > i_mmap_mutex will now be held for much longer time and thus the contention > would be much higher. But hopefully once we resolve our problems with > mmap_sem and introduce mapping range lock we could scale reasonably. Lockdep barfs on holding i_mmap_mutex while calling ext4's ->get_block. Path 1: ext4_fallocate -> ext4_punch_hole -> ext4_inode_attach_jinode() -> ... -> lock_map_acquire(&handle->h_lockdep_map); truncate_pagecache_range() -> unmap_mapping_range() -> mutex_lock(&mapping->i_mmap_mutex); Path 2: do_dax_fault() -> mutex_lock(&mapping->i_mmap_mutex); ext4_get_block() -> ... -> lock_map_acquire(&handle->h_lockdep_map); So that idea doesn't work. We can't exclude truncates by incrementing i_dio_count, because we can't take i_mutex in the fault path. I'm stumped. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html