page_mkwrite is called with neither the page lock nor the ptl held. This means a page can be concurrently truncated or invalidated out from underneath it. Callers are supposed to prevent truncate races themselves, however previously the only thing they can do in case they hit one is to raise a SIGBUS. A sigbus is wrong for the case that the page has been invalidated or truncated within i_size (eg. hole punched). Callers may also have to perform memory allocations in this path, where again, SIGBUS would be wrong. The previous patch made it possible to properly specify errors. Convert the generic buffer.c code and btrfs to return sane error values (in the case of page removed from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit without doing anything, and the fault will be retried properly). Should fix all filesystems, but this is just a demonstration/rfc. The other fses are slightly less trivial :) If anybody cares to fix their filesystem and send me a patch, that would be nice (but not required, because patch 1/2 is back compatible). --- fs/btrfs/inode.c | 10 ++++++---- fs/buffer.c | 12 ++++++++---- 2 files changed, 14 insertions(+), 8 deletions(-) Index: linux-2.6/fs/btrfs/inode.c =================================================================== --- linux-2.6.orig/fs/btrfs/inode.c +++ linux-2.6/fs/btrfs/inode.c @@ -4307,10 +4307,14 @@ int btrfs_page_mkwrite(struct vm_area_st u64 page_end; ret = btrfs_check_data_free_space(root, inode, PAGE_CACHE_SIZE); - if (ret) + if (ret) { + if (ret == -ENOMEM) + ret = VM_FAULT_OOM; + else /* -ENOSPC, -EIO, etc */ + ret = VM_FAULT_SIGBUS; goto out; - ret = -EINVAL; + ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */ again: lock_page(page); size = i_size_read(inode); @@ -4363,8 +4367,6 @@ again: out_unlock: unlock_page(page); out: - if (ret) - ret = VM_FAULT_SIGBUS; return ret; } Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -2473,7 +2473,7 @@ block_page_mkwrite(struct vm_area_struct struct inode *inode = vma->vm_file->f_path.dentry->d_inode; unsigned long end; loff_t size; - int ret = -EINVAL; + int ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */ lock_page(page); size = i_size_read(inode); @@ -2493,10 +2493,14 @@ block_page_mkwrite(struct vm_area_struct if (!ret) ret = block_commit_write(page, 0, end); -out_unlock: - if (ret) - ret = VM_FAULT_SIGBUS; + if (unlikely(ret)) { + if (ret == -ENOMEM) + ret = VM_FAULT_OOM; + else /* -ENOSPC, -EIO, etc */ + ret = VM_FAULT_SIGBUS; + } +out_unlock: unlock_page(page); return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html