The patch titled uninline page_cache_get_speculative() has been added to the -mm tree. Its filename is mm-speculative-get_page-uninlining.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: uninline page_cache_get_speculative() From: Andrew Morton <akpm@xxxxxxxx> Shrinks my SMP kernel by ~900 bytes. In the fastpath. This is so obvious, I have a feeling I'm missing something. Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx> Cc: Hugh Dickins <hugh@xxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- include/linux/pagemap.h | 102 -------------------------------------- mm/swap.c | 101 +++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 101 deletions(-) diff -puN include/linux/pagemap.h~mm-speculative-get_page-uninlining include/linux/pagemap.h --- a/include/linux/pagemap.h~mm-speculative-get_page-uninlining +++ a/include/linux/pagemap.h @@ -52,107 +52,7 @@ static inline void mapping_set_gfp_mask( #define page_cache_get(page) get_page(page) #define page_cache_release(page) put_page(page) void release_pages(struct page **pages, int nr, int cold); - -/* - * speculatively take a reference to a page. - * If the page is free (_count == 0), then _count is untouched, and NULL - * is returned. Otherwise, _count is incremented by 1 and page is returned. - * - * This function must be run in the same rcu_read_lock() section as has - * been used to lookup the page in the pagecache radix-tree: this allows - * allocators to use a synchronize_rcu() to stabilize _count. - * - * Unless an RCU grace period has passed, the count of all pages coming out - * of the allocator must be considered unstable. page_count may return higher - * than expected, and put_page must be able to do the right thing when the - * page has been finished with (because put_page is what is used to drop an - * invalid speculative reference). - * - * After incrementing the refcount, this function spins until PageNoNewRefs - * is clear, then a read memory barrier is issued. - * - * This forms the core of the lockless pagecache locking protocol, where - * the lookup-side (eg. find_get_page) has the following pattern: - * 1. find page in radix tree - * 2. conditionally increment refcount - * 3. wait for PageNoNewRefs - * 4. check the page is still in pagecache - * - * Remove-side (that cares about _count, eg. reclaim) has the following: - * A. SetPageNoNewRefs - * B. check refcount is correct - * C. remove page - * D. ClearPageNoNewRefs - * - * There are 2 critical interleavings that matter: - * - 2 runs before B: in this case, B sees elevated refcount and bails out - * - B runs before 2: in this case, 3 ensures 4 will not run until *after* C - * (after D, even). In which case, 4 will notice C and lookup side can retry - * - * It is possible that between 1 and 2, the page is removed then the exact same - * page is inserted into the same position in pagecache. That's OK: the - * old find_get_page using tree_lock could equally have run before or after - * the write-side, depending on timing. - * - * Pagecache insertion isn't a big problem: either 1 will find the page or - * it will not. Likewise, the old find_get_page could run either before the - * insertion or afterwards, depending on timing. - */ -static inline struct page *page_cache_get_speculative(struct page *page) -{ - VM_BUG_ON(in_interrupt()); - -#ifndef CONFIG_SMP - VM_BUG_ON(!in_atomic()); - /* - * Preempt must be disabled here - we rely on rcu_read_lock doing - * this for us. - * - * Pagecache won't be truncated from interrupt context, so if we have - * found a page in the radix tree here, we have pinned its refcount by - * disabling preempt, and hence no need for the "speculative get" that - * SMP requires. - */ - VM_BUG_ON(page_count(page) == 0); - atomic_inc(&page->_count); - -#else - if (unlikely(!get_page_unless_zero(page))) - return NULL; /* page has been freed */ - - /* - * Note that get_page_unless_zero provides a memory barrier. - * This is needed to ensure PageNoNewRefs is evaluated after the - * page refcount has been raised. See below comment. - */ - - while (unlikely(PageNoNewRefs(page))) - cpu_relax(); - - /* - * smp_rmb is to ensure the load of page->flags (for PageNoNewRefs()) - * is performed before a future load used to ensure the page is - * the correct on (usually: page->mapping and page->index). - * - * Those places that set PageNoNewRefs have the following pattern: - * SetPageNoNewRefs(page) - * wmb(); - * if (page_count(page) == X) - * remove page from pagecache - * wmb(); - * ClearPageNoNewRefs(page) - * - * If the load was out of order, page->mapping might be loaded before - * the page is removed from pagecache but PageNoNewRefs evaluated - * after the ClearPageNoNewRefs(). - */ - smp_rmb(); - -#endif - VM_BUG_ON(PageCompound(page) && (struct page *)page_private(page) != page); - - return page; -} +struct page *page_cache_get_speculative(struct page *page); #ifdef CONFIG_NUMA extern struct page *page_cache_alloc(struct address_space *x); diff -puN mm/swap.c~mm-speculative-get_page-uninlining mm/swap.c --- a/mm/swap.c~mm-speculative-get_page-uninlining +++ a/mm/swap.c @@ -74,6 +74,107 @@ void put_page(struct page *page) EXPORT_SYMBOL(put_page); /* + * speculatively take a reference to a page. + * If the page is free (_count == 0), then _count is untouched, and NULL + * is returned. Otherwise, _count is incremented by 1 and page is returned. + * + * This function must be run in the same rcu_read_lock() section as has + * been used to lookup the page in the pagecache radix-tree: this allows + * allocators to use a synchronize_rcu() to stabilize _count. + * + * Unless an RCU grace period has passed, the count of all pages coming out + * of the allocator must be considered unstable. page_count may return higher + * than expected, and put_page must be able to do the right thing when the + * page has been finished with (because put_page is what is used to drop an + * invalid speculative reference). + * + * After incrementing the refcount, this function spins until PageNoNewRefs + * is clear, then a read memory barrier is issued. + * + * This forms the core of the lockless pagecache locking protocol, where + * the lookup-side (eg. find_get_page) has the following pattern: + * 1. find page in radix tree + * 2. conditionally increment refcount + * 3. wait for PageNoNewRefs + * 4. check the page is still in pagecache + * + * Remove-side (that cares about _count, eg. reclaim) has the following: + * A. SetPageNoNewRefs + * B. check refcount is correct + * C. remove page + * D. ClearPageNoNewRefs + * + * There are 2 critical interleavings that matter: + * - 2 runs before B: in this case, B sees elevated refcount and bails out + * - B runs before 2: in this case, 3 ensures 4 will not run until *after* C + * (after D, even). In which case, 4 will notice C and lookup side can retry + * + * It is possible that between 1 and 2, the page is removed then the exact same + * page is inserted into the same position in pagecache. That's OK: the + * old find_get_page using tree_lock could equally have run before or after + * the write-side, depending on timing. + * + * Pagecache insertion isn't a big problem: either 1 will find the page or + * it will not. Likewise, the old find_get_page could run either before the + * insertion or afterwards, depending on timing. + */ +struct page *page_cache_get_speculative(struct page *page) +{ + VM_BUG_ON(in_interrupt()); + +#ifndef CONFIG_SMP + VM_BUG_ON(!in_atomic()); + /* + * Preempt must be disabled here - we rely on rcu_read_lock doing + * this for us. + * + * Pagecache won't be truncated from interrupt context, so if we have + * found a page in the radix tree here, we have pinned its refcount by + * disabling preempt, and hence no need for the "speculative get" that + * SMP requires. + */ + VM_BUG_ON(page_count(page) == 0); + atomic_inc(&page->_count); + +#else + if (unlikely(!get_page_unless_zero(page))) + return NULL; /* page has been freed */ + + /* + * Note that get_page_unless_zero provides a memory barrier. + * This is needed to ensure PageNoNewRefs is evaluated after the + * page refcount has been raised. See below comment. + */ + + while (unlikely(PageNoNewRefs(page))) + cpu_relax(); + + /* + * smp_rmb is to ensure the load of page->flags (for PageNoNewRefs()) + * is performed before a future load used to ensure the page is + * the correct on (usually: page->mapping and page->index). + * + * Those places that set PageNoNewRefs have the following pattern: + * SetPageNoNewRefs(page) + * wmb(); + * if (page_count(page) == X) + * remove page from pagecache + * wmb(); + * ClearPageNoNewRefs(page) + * + * If the load was out of order, page->mapping might be loaded before + * the page is removed from pagecache but PageNoNewRefs evaluated + * after the ClearPageNoNewRefs(). + */ + smp_rmb(); + +#endif + VM_BUG_ON(PageCompound(page) && (struct page *)page_private(page) != page); + + return page; +} + +/* * Writeback is about to end against a page which has been marked for immediate * reclaim. If it still appears to be reclaimable, move it to the tail of the * inactive list. The page still has PageWriteback set, which will pin it. _ Patches currently in -mm which might be from akpm@xxxxxxxx are ext3-avoid-triggering-ext3_error-on-bad-nfs-file-handle.patch mce-section-fix.patch synchronize_tsc-fixes.patch invalidate_bdev-speedup.patch ide-touch-nmi-watchdog-during-resume-from-str.patch disable-debugging-version-of-write_lock.patch acpi-asus-s3-resume-fix.patch sony_apci-resume.patch kauditd_thread-warning-fix.patch revert-gregkh-driver-class_device_rename-remove.patch revert-gregkh-driver-network-class_device-to-device.patch revert-gregkh-driver-tty-device.patch revert-gregkh-driver-mem-devices.patch add-__must_check-to-device-management-code.patch add-config_enable_must_check.patch v4l-dev2-handle-__must_check.patch drivers-base-check-errors.patch drivers-base-check-errors-fix.patch sysfs-add-proper-sysfs_init-prototype.patch scsi-device_reprobe-can-fail.patch git-dvb.patch git-dvb-fixup.patch dvb-core-needs-i2c.patch git-geode-fixup.patch git-gfs2.patch git-ia64.patch git-ieee1394-fixup.patch git-input.patch logips2pp-fix-mx300-button-layout.patch git-libata-all.patch sata-is-bust-on-s390.patch rework-legacy-handling-to-remove-much-of-the-cruft-fix.patch rework-legacy-handling-to-remove-much-of-the-cruft-powerpc-fix.patch git-netdev-all.patch 82596-section-fixes.patch ac3200-section-fixes.patch cops-section-fix.patch cs89x0-section-fix.patch at1700-section-fix.patch e2100-section-fix.patch eepro-section-fix.patch eexpress-section-fix.patch es3210-section-fix.patch eth16i-section-fix.patch lance-section-fix.patch lne390-section-fix.patch ni52-section-fix.patch ibmtr-section-fix.patch smctr-section-fix.patch wd-section-fix.patch ni65-section-fix.patch seeq8005-section-fix.patch winbond-840-section-fix.patch fealnx-section-fix.patch sundance-section-fix.patch e1000_7033_dump_ring-fix.patch s2io-build-fix.patch drivers-net-ns83820c-add-paramter-to-disable-auto.patch ppp-handle-kmalloc-failures-leak-fix.patch xt_physdev-build-fix.patch git-nfs.patch git-pcmcia-fixup.patch git-powerpc-briq_panel-Kconfig-fix.patch git-sas.patch serial-fix-uart_bug_txen-test.patch pcie-check-and-return-bus_register-errors-fix.patch git-kbuild-build-fix.patch git-scsi-misc.patch fix-panic-when-reinserting-adaptec-pcmcia-scsi-card-tidy.patch areca-sysfs-fix.patch git-scsi-target-fixup.patch usb-hub-driver-improve-use-of-ifdef-fix.patch pm-usb-hcds-use-pm_event_prethaw-fix.patch kill-usb-kconfig-warning.patch rtl8150_disconnect-needs-tasklet_kill.patch git-supertrak-fixup.patch git-wireless-bcm43xx-fix.patch kthread-airoc-race-fix.patch revert-x86_64-mm-ieee1394-early.patch fix-x86_64-mm-via-force-dma-mask-config_pcin-fix.patch fix-x86_64-mm-allow-users-to-force-a-panic-on-nmi.patch sleazy-fpu-feature-x86_64-support.patch x86_64-wire-up-oops_enter-oops_exit.patch adix-tree-rcu-lockless-readside-update-tidy.patch mm-tracking-shared-dirty-pages-checks.patch mm-tracking-shared-dirty-pages-wimp.patch convert-i386-numa-kva-space-to-bootmem-tidy.patch reduce-max_nr_zones-make-display-of-highmem-counters-conditional-on-config_highmem-tidy.patch reduce-max_nr_zones-use-enum-to-define-zones-reformat-and-comment-cleanup.patch reduce-max_nr_zones-remove-display-of-counters-for-unconfigured-zones-s390-fix.patch out-of-memory-notifier-tidy.patch mm-speculative-get_page-uninlining.patch acx1xx-wireless-driver.patch tiacx-pci-build-fix.patch tiacx-ia64-fix.patch tiacx-build-fix.patch binfmt_elf-consistently-use-loff_t.patch add-force-of-use-mmconfig-fix.patch convert-i386-summit-subarch-to-use-srat-info-for-apicid_to_node-calls-tidy.patch add-efi-e820-memory-mapping-on-x86.patch add-efi-e820-memory-mapping-on-x86-tidy.patch add-efi-e820-memory-mapping-on-x86-fix.patch i386-adds-smp_call_function_single-fix.patch swsusp-write-timer.patch swsusp-write-speedup.patch swsusp-read-timer.patch swsusp-read-speedup.patch swsusp-read-speedup-fix.patch swsusp-read-speedup-cleanup.patch swsusp-read-speedup-cleanup-2.patch deprecate-smbfs-in-favour-of-cifs.patch edac-new-opteron-athlon64-memory-controller-driver-tidy.patch inode_diet-replace-inodeugeneric_ip-with-inodei_private-gfs-fix.patch x86-microcode-microcode-driver-cleanup-tidy.patch x86-microcode-add-sysfs-and-hotplug-support-fix.patch eisa-bus-modalias-attributes-support-1-fix-git-kbuild-fix.patch add-address_space_operationsbatch_write-fix.patch alloc_fdtable-cleanup.patch add-probe_kernel_address.patch x86-use-probe_kernel_address-in-handle_bug.patch blockdevc-check-errors.patch let-warn_on-warn_on_once-return-the-condition-fix.patch let-warn_on-warn_on_once-return-the-condition-fix-2.patch omap-add-watchdog-driver-support-tweaks.patch reiserfs-on-demand-bitmap-loading.patch streamline-generic_file_-interfaces-and-filemap-gfs-fix.patch stack-overflow-safe-kdump-crash_use_safe_smp_processor_id-fix.patch knfsd-add-a-callback-for-when-last-rpc-thread-finishes-tidy.patch knfsd-add-a-callback-for-when-last-rpc-thread-finishes-fix.patch knfsd-separate-out-some-parts-of-nfsd_svc-which-start-nfs-servers-tweaks.patch knfsd-define-new-nfsdfs-file-portlist-contains-list-of-ports-tidy.patch knfsd-define-new-nfsdfs-file-portlist-contains-list-of-ports-fix.patch knfsd-have-ext2-reject-file-handles-with-bad-inode-numbers-early-tidy.patch knfsd-make-ext3-reject-filehandles-referring-to-invalid-inode-numbers-tidy.patch knfsd-drop-serv-option-to-svc_recv-and-svc_process-nfs-callback-fix-nfs-callback-fix.patch swap_prefetch-vs-zoned-counters.patch ecryptfs-mmap-operations.patch ecryptfs-alpha-build-fix.patch ecryptfs-more-elegant-aes-key-size-manipulation.patch ecryptfs-get_sb_dev-fix.patch namespaces-add-nsproxy-dont-include-compileh.patch namespaces-utsname-switch-to-using-uts-namespaces.patch namespaces-utsname-use-init_utsname-when-appropriate.patch namespaces-utsname-implement-utsname-namespaces.patch namespaces-utsname-sysctl-hack.patch namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-sparc.patch ipc-namespace-core.patch readahead-sysctl-parameters-fix.patch make-copy_from_user_inatomic-not-zero-the-tail-on-i386-vs-reiser4.patch reiser4-hardirq-include-fix.patch reiser4-run-truncate_inode_pages-in-reiser4_delete_inode.patch reiser4-get_sb_dev-fix.patch reiser4-vs-zoned-allocator.patch hpt3xx-rework-rate-filtering-tidy.patch genirq-convert-the-i386-architecture-to-irq-chips.patch genirq-x86_64-irq-reenable-migrating-irqs-to-other-cpus.patch genirq-msi-simplify-msi-enable-and-disable.patch genirq-ia64-irq-dynamic-irq-support.patch genirq-msi-only-build-msi-apicc-on-ia64-fix.patch genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector.patch genirq-x86_64-irq-make-vector_irq-per-cpu-fix.patch add-hypertransport-capability-defines-fix.patch initial-generic-hypertransport-interrupt-support-Kconfig-fix.patch nr_blockdev_pages-in_interrupt-warning.patch device-suspend-debug.patch revert-tty-buffering-comment-out-debug-code.patch slab-leaks3-default-y.patch x86-kmap_atomic-debugging.patch restore-rogue-readahead-printk.patch jmicron-warning-fix.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html