The patch titled try_to_free_buffers(): don't clear pte dirty bits has been added to the -mm tree. Its filename is try_to_free_buffers-dont-clear-pte-dirty-bits.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: try_to_free_buffers(): don't clear pte dirty bits From: Andrew Morton <akpm@xxxxxxxx> try_to_free_buffers() clears the page's dirty state if it successfully removed the page's buffers. Background for this: - a process does a one-byte-write to a file on a 64k pagesize, 4k blocksize ext3 filesystem. The page is now PageDirty, !PgeUptodate and has one dirty buffer and 15 not uptodate buffers. - kjournald writes the dirty buffer. The page is now PageDirty, !PageUptodate and has a mix of clean and not uptodate buffers. - try_to_free_buffers() removes the page's buffers. It MUST now clear PageDirty. If we were to leave the page dirty then we'd have a dirty, not uptodate page with no buffer_heads. We're screwed: we cannot write the page because we don't know which sections of it contain garbage. We cannot read the page because we don't know which sections of it contain modified data. We cannot free the page because it is dirty. Peter's "mm: tracking shared dirty pages" (d08b3851da41d0ee60851f2c75b118e1f7a5fc89) modified clear_page_dirty() so that it also clears the page's pte mapping's dirty flags, arranging for a subsequent userspace modification of the page to cause a fault. That change to clear_page_dirty() was correct for when it is called on the writeback path. Here, we effectively do: ClearPageDirty() pte_mkclean() submit-the-writeout if a page-dirtying via write() or via pte's happens after the ClearPageDirty() or the pte_mkclean() then the page is redirtied while writeout is in flight and the page will again need writing; no probs. But that change to clear_page_dirty() was incorrect for when it is called on the try_to_free_buffers() path. Here, we want to preserve any pte-dirtiness because we're not going to write the page to backing store. We need to keep a record of any userspace modification to the page. One way of addressing this would be to bale from try_to_free_buffers() if the page is mapped into pagetables. However that is racy, because the pagefault path doesn't lock the page when establishing a pte against it (I which it did - it would solve a lot of nasties). So this patch instead arranges for clear_page_dirty() to not clean the pte's when it is called on the try_to_free_buffers() path. clear_page_dirty() had several callers and it's not immediately obvious to me what the appropriate behaviour is in each case. Could maintainers please take a look? >From my quick reading, all callers of try_to_free_buffers() have already unmapped the page from pagetables, and given that the reported ext3 corruption happens on uniprocessor, non-preempt kernels, I doubt if this patch will fix things. But even if it is true that try_to_free_buffers() callers unmap the page first, this fix is still needed, because a minor fault could reestablish pte's in the meanwhile. Note that with this change, we can now restore try_to_free_buffers()'s ->private_lock to cover the test_clear_page_dirty(). If we indeed need to do that, it'll be in a separate patch. (Need to think about this some more. How can a page be pte-dirty, but not have dirty buffers? We're supposed to clean the pte's when we write the page, and we dirty the page and buffers when userspace dirties the pte...) Cc: Miklos Szeredi <miklos@xxxxxxxxxx> Cc: <reiserfs-dev@xxxxxxxxxxx> Cc: Dave Kleikamp <shaggy@xxxxxxxxxxxxxx> Cc: David Chinner <dgc@xxxxxxx> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> Cc: Hugh Dickins <hugh@xxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- fs/buffer.c | 2 +- fs/cifs/file.c | 2 +- fs/fuse/file.c | 2 +- fs/hugetlbfs/inode.c | 2 +- fs/jfs/jfs_metapage.c | 2 +- fs/reiserfs/stree.c | 2 +- fs/xfs/linux-2.6/xfs_aops.c | 2 +- include/linux/page-flags.h | 6 +++--- mm/page-writeback.c | 5 +++-- mm/truncate.c | 4 ++-- 10 files changed, 15 insertions(+), 14 deletions(-) diff -puN fs/buffer.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/buffer.c --- a/fs/buffer.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/buffer.c @@ -2858,7 +2858,7 @@ int try_to_free_buffers(struct page *pag * the page's buffers clean. We discover that here and clean * the page also. */ - if (test_clear_page_dirty(page)) + if (test_clear_page_dirty(page, 0)) task_io_account_cancelled_write(PAGE_CACHE_SIZE); } out: diff -puN fs/fuse/file.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/fuse/file.c --- a/fs/fuse/file.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(&fc->lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff -puN fs/hugetlbfs/inode.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/hugetlbfs/inode.c --- a/fs/hugetlbfs/inode.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff -puN fs/jfs/jfs_metapage.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/jfs/jfs_metapage.c --- a/fs/jfs/jfs_metapage.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ void release_metapage(struct metapage * /* Retest mp->count since we may have released page lock */ if (test_bit(META_discard, &mp->flag) && !mp->count) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); } #else diff -puN fs/reiserfs/stree.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/reiserfs/stree.c --- a/fs/reiserfs/stree.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p bh = next; } while (bh != head); if (PAGE_SIZE == bh->b_size) { - clear_page_dirty(page); + clear_page_dirty(page, 0); } } } diff -puN fs/xfs/linux-2.6/xfs_aops.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/xfs/linux-2.6/xfs_aops.c --- a/fs/xfs/linux-2.6/xfs_aops.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/xfs/linux-2.6/xfs_aops.c @@ -343,7 +343,7 @@ xfs_start_page_writeback( ASSERT(!PageWriteback(page)); set_page_writeback(page); if (clear_dirty) - clear_page_dirty(page); + clear_page_dirty(page, 1); unlock_page(page); if (!buffers) { end_page_writeback(page); diff -puN include/linux/page-flags.h~try_to_free_buffers-dont-clear-pte-dirty-bits include/linux/page-flags.h --- a/include/linux/page-flags.h~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/include/linux/page-flags.h @@ -253,13 +253,13 @@ static inline void SetPageUptodate(struc struct page; /* forward declaration */ -int test_clear_page_dirty(struct page *page); +int test_clear_page_dirty(struct page *page, int must_clean_ptes); int test_clear_page_writeback(struct page *page); int test_set_page_writeback(struct page *page); -static inline void clear_page_dirty(struct page *page) +static inline void clear_page_dirty(struct page *page, int must_clean_ptes) { - test_clear_page_dirty(page); + test_clear_page_dirty(page, must_clean_ptes); } static inline void set_page_writeback(struct page *page) diff -puN mm/page-writeback.c~try_to_free_buffers-dont-clear-pte-dirty-bits mm/page-writeback.c --- a/mm/page-writeback.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/mm/page-writeback.c @@ -848,7 +848,7 @@ EXPORT_SYMBOL(set_page_dirty_lock); * Clear a page's dirty flag, while caring for dirty memory accounting. * Returns true if the page was previously dirty. */ -int test_clear_page_dirty(struct page *page) +int test_clear_page_dirty(struct page *page, int must_clean_ptes) { struct address_space *mapping = page_mapping(page); unsigned long flags; @@ -866,7 +866,8 @@ int test_clear_page_dirty(struct page *p * page is locked, which pins the address_space */ if (mapping_cap_account_dirty(mapping)) { - page_mkclean(page); + if (must_clean_ptes) + page_mkclean(page); dec_zone_page_state(page, NR_FILE_DIRTY); } return 1; diff -puN mm/truncate.c~try_to_free_buffers-dont-clear-pte-dirty-bits mm/truncate.c --- a/mm/truncate.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/mm/truncate.c @@ -70,7 +70,7 @@ truncate_complete_page(struct address_sp if (PagePrivate(page)) do_invalidatepage(page, 0); - if (test_clear_page_dirty(page)) + if (test_clear_page_dirty(page, 1)) task_io_account_cancelled_write(PAGE_CACHE_SIZE); ClearPageUptodate(page); ClearPageMappedToDisk(page); @@ -386,7 +386,7 @@ int invalidate_inode_pages2_range(struct PAGE_CACHE_SIZE, 0); } } - was_dirty = test_clear_page_dirty(page); + was_dirty = test_clear_page_dirty(page, 0); if (!invalidate_complete_page2(mapping, page)) { if (was_dirty) set_page_dirty(page); diff -puN fs/cifs/file.c~try_to_free_buffers-dont-clear-pte-dirty-bits fs/cifs/file.c --- a/fs/cifs/file.c~try_to_free_buffers-dont-clear-pte-dirty-bits +++ a/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 1)) { unlock_page(page); break; } _ Patches currently in -mm which might be from akpm@xxxxxxxx are try_to_free_buffers-dont-clear-pte-dirty-bits.patch deadlock-in-mincore-tidy.patch deadlock-in-mincore-speedup.patch rtc-warning-fix.patch fix-vm_events_fold_cpu-build-breakage-fix.patch smc911-workqueue-fixes.patch build-compileh-earlier.patch macintosh-mangle-caps-lock-events-on-adb-keyboards.patch git-acpi.patch git-acpi-cpufreq-fixup.patch acpi-dont-select-pm.patch implementation-of-acpi_video_get_next_level.patch video-sysfs-support-take-2-add-dev-argument-for-backlight_device_register.patch sony_apci-resume.patch sony_apci-resume-fix.patch video-sysfs-support-take-2-add-dev-argument-for-backlight_device_register-sony_acpi-fix.patch git-alsa.patch arm-systemh-build-fix.patch cifs-sprintf-fix.patch git-drm.patch ia64-enable-config_debug_spinlock_sleep.patch git-libata-all.patch git-lxdialog-fixup.patch git-mmc-fixup.patch git-mmc-tifm_sd-warning-fix.patch git-mtd.patch git-ubi.patch ubi-versus-add-include-linux-freezerh-and-move-definitions-from.patch update-smc91x-driver-with-arm-versatile-board-info.patch driver-for-silan-sc92031-netdev-include-fix.patch driver-for-silan-sc92031-netdev-fix-more.patch drivers-net-ns83820c-add-paramter-to-disable-auto.patch net-use-bitrev8.patch net-uninline-skb_put.patch ioat-warning-fix.patch pci-legacy-resource-fix-tidy.patch pci-disable-multithreaded-probing.patch drivers-scsi-mca_53c9xc-save_flags-cli-removal.patch scsi-cover-up-bugs-fix-up-compiler-warnings-in-megaraid-driver-fix.patch git-qla3xxx-fixup.patch funsoft-is-bust-on-sparc.patch nokia-e70-is-an-unusual-device.patch fix-gregkh-usb-usb-ehci-hcd-add-shadow-budget-code.patch git-wireless.patch revert-i386-fix-the-verify_quirk_intel_irqbalance.patch revert-x86_64-mm-add-genapic_force.patch revert-x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525.patch revert-x86_64-mm-copy-user-nocache.patch convert-i386-pda-code-to-use-%fs-fixes.patch add-memcpy_uncached_read-fix.patch add-memcpy_uncached_read-tidy.patch touchkit-ps-2-touchscreen-driver.patch virtual-memmap-on-sparsemem-v3-map-and-unmap-fix-2.patch virtual-memmap-on-sparsemem-v3-map-and-unmap-fix-3.patch lumpy-reclaim-v2-page_to_pfn-fix.patch lumpy-reclaim-v2-tidy.patch nfs-fix-nr_file_dirty-underflow-tidy.patch deprecate-smbfs-in-favour-of-cifs.patch drivers-add-lcd-support-3-Kconfig-fix.patch drivers-add-lcd-support-workqueue-fixups.patch ecryptfs-public-key-packet-management-slab-fix.patch add-retain_initrd-boot-option-tweak.patch count_vm_events-warning-fix.patch procfs-fix-race-between-proc_readdir-and-remove_proc_entry-fix.patch schedule_on_each_cpu-use-preempt_disable.patch gtod-persistent-clock-support-i386.patch hrtimers-clean-up-locking.patch hrtimers-add-state-tracking.patch clockevents-i386-drivers.patch workqueue-dont-hold-workqueue_mutex-in-flush_scheduled_work.patch move-page-writeback-acounting-out-of-macros.patch per-backing_dev-dirty-and-writeback-page-accounting.patch ext2-reservations.patch edac-new-opteron-athlon64-memory-controller-driver.patch sched2-sched-domain-sysctl-use-ctl_unnumbered.patch mm-implement-swap-prefetching-use-ctl_unnumbered.patch swap_prefetch-vs-zoned-counters.patch add-include-linux-freezerh-and-move-definitions-from-prefetch.patch readahead-kconfig-options-fix.patch readahead-minmax_ra_pages.patch readahead-sysctl-parameters.patch readahead-sysctl-parameters-use-ctl_unnumbered.patch readahead-context-based-method-locking-fix.patch readahead-context-based-method-locking-fix-2.patch readahead-call-scheme-ifdef-fix.patch readahead-call-scheme-build-fix.patch readahead-nfsd-case-fix.patch make-copy_from_user_inatomic-not-zero-the-tail-on-i386-vs-reiser4.patch resier4-add-include-linux-freezerh-and-move-definitions-from.patch make-kmem_cache_destroy-return-void-reiser4.patch reiser4-hardirq-include-fix.patch reiser4-run-truncate_inode_pages-in-reiser4_delete_inode.patch reiser4-get_sb_dev-fix.patch reiser4-vs-zoned-allocator.patch reiser4-temp-fix.patch reiser4-kmem_cache_t-removal.patch hpt3xx-rework-rate-filtering-tidy.patch jmicron-warning-fix.patch statistics-infrastructure-fix-buffer-overflow-in-histogram-with-linear-tidy.patch extend-notifier_call_chain-to-count-nr_calls-made.patch extend-notifier_call_chain-to-count-nr_calls-made-fixes-2.patch define-and-use-new-eventscpu_lock_acquire-and-cpu_lock_release-fix.patch eliminate-lock_cpu_hotplug-in-kernel-schedc-fix.patch slim-main-include-fix.patch nr_blockdev_pages-in_interrupt-warning.patch device-suspend-debug.patch mutex-subsystem-synchro-test-module-fix.patch slab-leaks3-default-y.patch vdso-print-fatal-signals-use-ctl_unnumbered.patch restore-rogue-readahead-printk.patch put_bh-debug.patch e1000-printk-warning-fixes.patch acpi_format_exception-debug.patch add-debugging-aid-for-memory-initialisation-problems-fix.patch kmap_atomic-debugging.patch squash-ipc-warnings.patch squash-udf-warnings.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html