The patch titled fix-pagecache-write-deadlocks has been removed from the -mm tree. Its filename is fix-pagecache-write-deadlocks.patch This patch was dropped because it is obsolete ------------------------------------------------------ Subject: fix-pagecache-write-deadlocks From: Andrew Morton <akpm@xxxxxxxx> This is half-written and won't work. The idea is to modify the core write() code so that it won't take a pagefault while holding a lock on the pagecache page. - Instead of copy_from_user(), use inc_preempt_count() and copy_from_user_inatomic(). - If the copy_from_user_inatomic() hits a pagefault, it'll return a short copy. - So zero out the remainder of the pagecache page (the uncopied bit). - but only if the page is not uptodate. - commit_write() - unlock_page() - adjust various pointers and counters - go back and try to fault the page in again, redo the lock_page, prepare_write, copy_from_user_inatomic(), etc. - After a certain number of retries, someone is being silly: give up. Now, the design objective here isn't just to fix the deadlock. It's to be able to copy multiple iovec segments into the pagecache page within a single prepare-write/commit_write pair. But to do that, we'll need to prefault them. That could get complex. Walk across the segments, touching each user page until we reach the point where we see that this iovec segment doesn't fall into the target page. Alternatively, only prefault the *present* iovec segment. The code as designed will handle pagefaults against the user's pages quite happily. But is it efficient? Needs thought. (I think we will end up with quite a bit of dead code as a reault of this exercise - some of the fancy user-copying inlines. Needs checking when the dust has settled). Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- mm/filemap.c | 7 ++-- mm/filemap.h | 69 ++++++++++++++++++++++++++++++++++--------------- 2 files changed, 53 insertions(+), 23 deletions(-) diff -puN mm/filemap.c~fix-pagecache-write-deadlocks mm/filemap.c --- a/mm/filemap.c~fix-pagecache-write-deadlocks +++ a/mm/filemap.c @@ -2133,11 +2133,12 @@ generic_file_buffered_write(struct kiocb break; } if (likely(nr_segs == 1)) - copied = filemap_copy_from_user(page, offset, + copied = filemap_copy_from_user_atomic(page, offset, buf, bytes); else - copied = filemap_copy_from_user_iovec(page, offset, - cur_iov, iov_offset, bytes); + copied = filemap_copy_from_user_iovec_atomic(page, + offset, cur_iov, iov_offset, + bytes); flush_dcache_page(page); status = a_ops->commit_write(file, page, offset, offset+bytes); if (status == AOP_TRUNCATED_PAGE) { diff -puN mm/filemap.h~fix-pagecache-write-deadlocks mm/filemap.h --- a/mm/filemap.h~fix-pagecache-write-deadlocks +++ a/mm/filemap.h @@ -22,19 +22,19 @@ __filemap_copy_from_user_iovec_inatomic( /* * Copy as much as we can into the page and return the number of bytes which - * were sucessfully copied. If a fault is encountered then clear the page - * out to (offset+bytes) and return the number of bytes which were copied. + * were sucessfully copied. If a fault is encountered then return the number of + * bytes which were copied. * - * NOTE: For this to work reliably we really want copy_from_user_inatomic_nocache - * to *NOT* zero any tail of the buffer that it failed to copy. If it does, - * and if the following non-atomic copy succeeds, then there is a small window - * where the target page contains neither the data before the write, nor the - * data after the write (it contains zero). A read at this time will see - * data that is inconsistent with any ordering of the read and the write. - * (This has been detected in practice). + * NOTE: For this to work reliably we really want + * copy_from_user_inatomic_nocache to *NOT* zero any tail of the buffer that it + * failed to copy. If it does, and if the following non-atomic copy succeeds, + * then there is a small window where the target page contains neither the data + * before the write, nor the data after the write (it contains zero). A read at + * this time will see data that is inconsistent with any ordering of the read + * and the write. (This has been detected in practice). */ static inline size_t -filemap_copy_from_user(struct page *page, unsigned long offset, +filemap_copy_from_user_atomic(struct page *page, unsigned long offset, const char __user *buf, unsigned bytes) { char *kaddr; @@ -53,14 +53,28 @@ filemap_copy_from_user(struct page *page return bytes - left; } +static inline size_t +filemap_copy_from_user_nonatomic(struct page *page, unsigned long offset, + const char __user *buf, unsigned bytes) +{ + int left; + char *kaddr; + + kaddr = kmap(page); + left = __copy_from_user_nocache(kaddr + offset, buf, bytes); + kunmap(page); + return bytes - left; +} + /* - * This has the same sideeffects and return value as filemap_copy_from_user(). + * This has the same sideeffects and return value as + * filemap_copy_from_user_atomic(). * The difference is that on a fault we need to memset the remainder of the * page (out to offset+bytes), to emulate filemap_copy_from_user()'s * single-segment behaviour. */ static inline size_t -filemap_copy_from_user_iovec(struct page *page, unsigned long offset, +filemap_copy_from_user_iovec_atomic(struct page *page, unsigned long offset, const struct iovec *iov, size_t base, size_t bytes) { char *kaddr; @@ -70,14 +84,29 @@ filemap_copy_from_user_iovec(struct page copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov, base, bytes); kunmap_atomic(kaddr, KM_USER0); - if (copied != bytes) { - kaddr = kmap(page); - copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov, - base, bytes); - if (bytes - copied) - memset(kaddr + offset + copied, 0, bytes - copied); - kunmap(page); - } + return copied; +} + +/* + * This has the same sideeffects and return value as + * filemap_copy_from_user_nonatomic(). + * The difference is that on a fault we need to memset the remainder of the + * page (out to offset+bytes), to emulate filemap_copy_from_user_nonatomic()'s + * single-segment behaviour. + */ +static inline size_t +filemap_copy_from_user_iovec_nonatomic(struct page *page, unsigned long offset, + const struct iovec *iov, size_t base, size_t bytes) +{ + char *kaddr; + size_t copied; + + kaddr = kmap(page); + copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov, + base, bytes); + if (bytes - copied) + memset(kaddr + offset + copied, 0, bytes - copied); + kunmap(page); return copied; } _ Patches currently in -mm which might be from akpm@xxxxxxxx are revert-pci-quirk-for-ibm-dock-ii-cardbus-controllers.patch ioc4-fixes.patch ioc4-kconfig-fix.patch remove_mapping-fix.patch proc_numbuf-is-wrong.patch carta_random32-fix-linkage.patch rename-net_random-to-random32-fixes.patch rename-net_random-to-random32-fix-2.patch vmalloc-dont-pass-__gfp_zero-to-slab.patch fix-build-breakage-with-config_ppc32-fix.patch revert-generic_file_buffered_write-handle-zero-length-iovec-segments.patch revert-generic_file_buffered_write-deadlock-on-vectored-write.patch generic_file_buffered_write-cleanup.patch mm-fix-pagecache-write-deadlocks.patch fix-pagecache-write-deadlocks.patch git-acpi.patch i386-acpi-build-fix.patch acpi-asus-s3-resume-fix.patch sony_apci-resume.patch git-dvb-build-fix.patch git-infiniband.patch git-input.patch git-input-fixup.patch git-libata-all.patch mtd-maps-support-for-bios-flash-chips-on-intel-esb2-southbridge.patch git-netdev-all.patch libphy-dont-do-that.patch drivers-net-ns83820c-add-paramter-to-disable-auto.patch git-pcmcia-fixup.patch git-serial-fixup.patch git-scsi-target-fixup.patch git-scsi-target-vs-git-block.patch fix-gregkh-usb-usbatm-fix-tiny-race.patch xpad-dance-pad-support.patch git-watchdog.patch x86_64-dump_trace-atomicity-fix.patch spinlock-debug-all-cpu-backtrace.patch xfs-rename-uio_read.patch touchkit-ps-2-touchscreen-driver.patch get-rid-of-zone_table.patch new-scheme-to-preempt-swap-token-tidy.patch radix-tree-rcu-lockless-readside.patch acx1xx-wireless-driver.patch swsusp-add-resume_offset-command-line-parameter-rev-2.patch deprecate-smbfs-in-favour-of-cifs.patch edac-new-opteron-athlon64-memory-controller-driver.patch add-address_space_operationsbatch_write.patch kbuild-dont-put-temp-files-in-the-source-tree.patch lockdep-annotate-nfs-nfsd-in-kernel-sockets-tidy.patch bug-test-1.patch log2-implement-a-general-integer-log2-facility-in-the-kernel-fix.patch fs-cache-provide-a-filesystem-specific-syncable-page-bit-ext4.patch fs-cache-make-kafs-use-fs-cache-fix.patch fs-cache-make-kafs-use-fs-cache-vs-streamline-generic_file_-interfaces-and-filemap.patch nfs-use-local-caching-12-fix.patch fs-cache-cachefiles-a-cache-that-backs-onto-a-mounted-filesystem-log2-fix.patch swap_prefetch-vs-zoned-counters.patch readahead-sysctl-parameters.patch make-copy_from_user_inatomic-not-zero-the-tail-on-i386-vs-reiser4.patch make-kmem_cache_destroy-return-void-reiser4.patch reiser4-hardirq-include-fix.patch reiser4-run-truncate_inode_pages-in-reiser4_delete_inode.patch reiser4-get_sb_dev-fix.patch reiser4-vs-zoned-allocator.patch reiser4-rename-generic_sounding_globalspatch-fix.patch hpt3xx-rework-rate-filtering-tidy.patch gtod-persistent-clock-support-i386.patch hrtimers-state-tracking.patch clockevents-drivers-for-i386.patch gtod-mark-tsc-unusable-for-highres-timers.patch round_jiffies-infrastructure-fix.patch kevent-core-files-fix.patch kevent-core-files-s390-hack.patch kevent-socket-notifications-fix-2.patch kevent-socket-notifications-fix-4.patch kevent-timer-notifications-fix.patch nr_blockdev_pages-in_interrupt-warning.patch device-suspend-debug.patch mutex-subsystem-synchro-test-module-fix.patch slab-leaks3-default-y.patch x86-kmap_atomic-debugging.patch restore-rogue-readahead-printk.patch put_bh-debug.patch acpi_format_exception-debug.patch warn-if-setting-non-uptodate-page-dirty.patch jmicron-warning-fix.patch squash-ipc-warnings.patch squash-udf-warnings.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html