This is a note to let you know that I've just added the patch titled [PATCH v2 for-4.9 34/40] slub: move synchronize_sched out of slab_mutex on shrink to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: slub-move-synchronize_sched-out-of-slab_mutex-on-shrink.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From foo@baz Mon Mar 20 11:41:01 CET 2017 From: alexander.levin@xxxxxxxxxxx Date: Fri, 17 Mar 2017 00:48:31 +0000 Subject: [PATCH v2 for-4.9 34/40] slub: move synchronize_sched out of slab_mutex on shrink To: "gregkh@xxxxxxxxxxxxxxxxxxx" <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: "stable@xxxxxxxxxxxxxxx" <stable@xxxxxxxxxxxxxxx> Message-ID: <20170317004812.26960-34-alexander.levin@xxxxxxxxxxx> From: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> [ Upstream commit 89e364db71fb5e7fc8d93228152abfa67daf35fa ] synchronize_sched() is a heavy operation and calling it per each cache owned by a memory cgroup being destroyed may take quite some time. What is worse, it's currently called under the slab_mutex, stalling all works doing cache creation/destruction. Actually, there isn't much point in calling synchronize_sched() for each cache - it's enough to call it just once - after setting cpu_partial for all caches and before shrinking them. This way, we can also move it out of the slab_mutex, which we have to hold for iterating over the slab cache list. Link: https://bugzilla.kernel.org/show_bug.cgi?id=172991 Link: http://lkml.kernel.org/r/0a10d71ecae3db00fb4421bcd3f82bcc911f4be4.1475329751.git.vdavydov.dev@xxxxxxxxx Signed-off-by: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Reported-by: Doug Smythies <dsmythies@xxxxxxxxx> Acked-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Cc: Christoph Lameter <cl@xxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Cc: Pekka Enberg <penberg@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <alexander.levin@xxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- mm/slab.c | 4 ++-- mm/slab.h | 2 +- mm/slab_common.c | 27 +++++++++++++++++++++++++-- mm/slob.c | 2 +- mm/slub.c | 19 ++----------------- 5 files changed, 31 insertions(+), 23 deletions(-) --- a/mm/slab.c +++ b/mm/slab.c @@ -2332,7 +2332,7 @@ out: return nr_freed; } -int __kmem_cache_shrink(struct kmem_cache *cachep, bool deactivate) +int __kmem_cache_shrink(struct kmem_cache *cachep) { int ret = 0; int node; @@ -2352,7 +2352,7 @@ int __kmem_cache_shrink(struct kmem_cach int __kmem_cache_shutdown(struct kmem_cache *cachep) { - return __kmem_cache_shrink(cachep, false); + return __kmem_cache_shrink(cachep); } void __kmem_cache_release(struct kmem_cache *cachep) --- a/mm/slab.h +++ b/mm/slab.h @@ -146,7 +146,7 @@ static inline unsigned long kmem_cache_f int __kmem_cache_shutdown(struct kmem_cache *); void __kmem_cache_release(struct kmem_cache *); -int __kmem_cache_shrink(struct kmem_cache *, bool); +int __kmem_cache_shrink(struct kmem_cache *); void slab_kmem_cache_release(struct kmem_cache *); struct seq_file; --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -573,6 +573,29 @@ void memcg_deactivate_kmem_caches(struct get_online_cpus(); get_online_mems(); +#ifdef CONFIG_SLUB + /* + * In case of SLUB, we need to disable empty slab caching to + * avoid pinning the offline memory cgroup by freeable kmem + * pages charged to it. SLAB doesn't need this, as it + * periodically purges unused slabs. + */ + mutex_lock(&slab_mutex); + list_for_each_entry(s, &slab_caches, list) { + c = is_root_cache(s) ? cache_from_memcg_idx(s, idx) : NULL; + if (c) { + c->cpu_partial = 0; + c->min_partial = 0; + } + } + mutex_unlock(&slab_mutex); + /* + * kmem_cache->cpu_partial is checked locklessly (see + * put_cpu_partial()). Make sure the change is visible. + */ + synchronize_sched(); +#endif + mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) { if (!is_root_cache(s)) @@ -584,7 +607,7 @@ void memcg_deactivate_kmem_caches(struct if (!c) continue; - __kmem_cache_shrink(c, true); + __kmem_cache_shrink(c); arr->entries[idx] = NULL; } mutex_unlock(&slab_mutex); @@ -755,7 +778,7 @@ int kmem_cache_shrink(struct kmem_cache get_online_cpus(); get_online_mems(); kasan_cache_shrink(cachep); - ret = __kmem_cache_shrink(cachep, false); + ret = __kmem_cache_shrink(cachep); put_online_mems(); put_online_cpus(); return ret; --- a/mm/slob.c +++ b/mm/slob.c @@ -634,7 +634,7 @@ void __kmem_cache_release(struct kmem_ca { } -int __kmem_cache_shrink(struct kmem_cache *d, bool deactivate) +int __kmem_cache_shrink(struct kmem_cache *d) { return 0; } --- a/mm/slub.c +++ b/mm/slub.c @@ -3887,7 +3887,7 @@ EXPORT_SYMBOL(kfree); * being allocated from last increasing the chance that the last objects * are freed in them. */ -int __kmem_cache_shrink(struct kmem_cache *s, bool deactivate) +int __kmem_cache_shrink(struct kmem_cache *s) { int node; int i; @@ -3899,21 +3899,6 @@ int __kmem_cache_shrink(struct kmem_cach unsigned long flags; int ret = 0; - if (deactivate) { - /* - * Disable empty slabs caching. Used to avoid pinning offline - * memory cgroups by kmem pages that can be freed. - */ - s->cpu_partial = 0; - s->min_partial = 0; - - /* - * s->cpu_partial is checked locklessly (see put_cpu_partial), - * so we have to make sure the change is visible. - */ - synchronize_sched(); - } - flush_all(s); for_each_kmem_cache_node(s, node, n) { INIT_LIST_HEAD(&discard); @@ -3970,7 +3955,7 @@ static int slab_mem_going_offline_callba mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) - __kmem_cache_shrink(s, false); + __kmem_cache_shrink(s); mutex_unlock(&slab_mutex); return 0; Patches currently in stable-queue which might be from gregkh@xxxxxxxxxxxxxxxxxxx are queue-4.9/pci-add-comments-about-rom-bar-updating.patch queue-4.9/acpi-blacklist-make-dell-latitude-3350-ethernet-work.patch queue-4.9/s390-zcrypt-introduce-cex6-toleration.patch queue-4.9/dccp-tcp-fix-routing-redirect-race.patch queue-4.9/vrf-fix-use-after-free-in-vrf_xmit.patch queue-4.9/tcp-fix-various-issues-for-sockets-morphing-to-listen-state.patch queue-4.9/block-allow-write_same-commands-with-the-sg_io-ioctl.patch queue-4.9/strparser-destroy-workqueue-on-module-exit.patch queue-4.9/powerpc-mm-fix-build-break-when-cma-n-spapr_tce_iommu-y.patch queue-4.9/vfio-spapr-postpone-default-window-creation.patch queue-4.9/vfio-spapr-add-a-helper-to-create-default-dma-window.patch queue-4.9/pci-do-any-vf-bar-updates-before-enabling-the-bars.patch queue-4.9/usb-gadget-udc-atmel-remove-memory-leak.patch queue-4.9/x86-hyperv-handle-unknown-nmis-on-one-cpu-when-unknown_nmi_panic.patch queue-4.9/net-tunnel-set-inner-protocol-in-network-gro-hooks.patch queue-4.9/serial-8250_pci-detach-low-level-driver-during-pci-error-recovery.patch queue-4.9/powerpc-iommu-stop-using-current-in-mm_iommu_xxx.patch queue-4.9/tun-fix-premature-pollout-notification-on-tun-devices.patch queue-4.9/vxlan-correctly-validate-vxlan-id-against-vxlan_n_vid.patch queue-4.9/bpf-fix-regression-on-verifier-pruning-wrt-map-lookups.patch queue-4.9/tcp-dccp-block-bh-for-syn-processing.patch queue-4.9/net-sched-act_skbmod-remove-unneeded-rcu_read_unlock-in-tcf_skbmod_dump.patch queue-4.9/dccp-fix-memory-leak-during-tear-down-of-unsuccessful-connection-request.patch queue-4.9/xen-do-not-re-use-pirq-number-cached-in-pci-device-msi-msg-data.patch queue-4.9/vxlan-lock-rcu-on-tx-path.patch queue-4.9/mlxsw-spectrum_router-avoid-potential-packets-loss.patch queue-4.9/mpls-do-not-decrement-alive-counter-for-unregister-events.patch queue-4.9/net-phy-avoid-deadlock-during-phy_error.patch queue-4.9/uapi-fix-linux-packet_diag.h-userspace-compilation-error.patch queue-4.9/pci-separate-vf-bar-updates-from-standard-bar-updates.patch queue-4.9/pci-ignore-bar-updates-on-virtual-functions.patch queue-4.9/geneve-lock-rcu-on-tx-path.patch queue-4.9/dccp-fix-use-after-free-in-dccp_feat_activate_values.patch queue-4.9/l2tp-avoid-use-after-free-caused-by-l2tp_ip_backlog_recv.patch queue-4.9/powerpc-mm-iommu-vfio-spapr-put-pages-on-vfio-container-shutdown.patch queue-4.9/bpf-fix-state-equivalence.patch queue-4.9/scsi-ibmvscsis-clean-up-properly-if-target_submit_cmd-tmr-fails.patch queue-4.9/drm-nouveau-disp-gp102-fix-cursor-overlay-immediate-channel-indices.patch queue-4.9/pci-update-bars-using-property-bits-appropriate-for-type.patch queue-4.9/scsi-ibmvscsis-synchronize-cmds-at-remove-time.patch queue-4.9/vfio-spapr-postpone-allocation-of-userspace-version-of-tce-table.patch queue-4.9/ibmveth-calculate-gso_segs-for-large-packets.patch queue-4.9/net-mlx5e-do-not-reduce-lro-wqe-size-when-not-using-build_skb.patch queue-4.9/net-sched-actions-decrement-module-reference-count-after-table-flush.patch queue-4.9/pci-don-t-update-vf-bars-while-vf-memory-space-is-enabled.patch queue-4.9/ipv4-mask-tos-for-input-route.patch queue-4.9/net-fix-socket-refcounting-in-skb_complete_tx_timestamp.patch queue-4.9/net-bridge-allow-ipv6-when-multicast-flood-is-disabled.patch queue-4.9/net-mlx5e-fix-wrong-cqe-decompression.patch queue-4.9/net-net_enable_timestamp-can-be-called-from-irq-contexts.patch queue-4.9/igb-workaround-for-igb-i210-firmware-issue.patch queue-4.9/drivers-hv-ring_buffer-count-on-wrap-around-mappings-in-get_next_pkt_raw-v2.patch queue-4.9/drm-nouveau-disp-nv50-specify-ctrl-user-separately-when-constructing-classes.patch queue-4.9/ipv6-make-ecmp-route-replacement-less-greedy.patch queue-4.9/ipv6-avoid-write-to-a-possibly-cloned-skb.patch queue-4.9/pci-remove-pci_resource_bar-and-pci_iov_resource_bar.patch queue-4.9/mpls-send-route-delete-notifications-when-router-module-is-unloaded.patch queue-4.9/dmaengine-iota-ioat_alloc_chan_resources-should-not-perform-sleeping-allocations.patch queue-4.9/scsi-ibmvscsis-return-correct-partition-name-to-client.patch queue-4.9/vti6-return-gre_key-for-vti6.patch queue-4.9/vfio-spapr-reference-mm-in-tce_container.patch queue-4.9/scsi-ibmvscsis-rearrange-functions-for-future-patches.patch queue-4.9/dccp-unlock-sock-before-calling-sk_free.patch queue-4.9/bpf-fix-mark_reg_unknown_value-for-spilled-regs-on-map-value-marking.patch queue-4.9/powerpc-iommu-pass-mm_struct-to-init-cleanup-helpers.patch queue-4.9/slub-move-synchronize_sched-out-of-slab_mutex-on-shrink.patch queue-4.9/net-mlx5e-register-unregister-vport-representors-on-interface-attach-detach.patch queue-4.9/pci-decouple-ioresource_rom_enable-and-pci_rom_address_enable.patch queue-4.9/net-don-t-call-strlen-on-the-user-buffer-in-packet_bind_spkt.patch queue-4.9/bpf-detect-identical-ptr_to_map_value_or_null-registers.patch queue-4.9/scsi-ibmvscsis-issues-from-dan-carpenter-smatch.patch queue-4.9/vxlan-don-t-allow-overwrite-of-config-src-addr.patch queue-4.9/acpi-blacklist-add-_rev-quirks-for-dell-precision-5520-and-3520.patch queue-4.9/bridge-drop-netfilter-fake-rtable-unconditionally.patch queue-4.9/igb-add-i211-to-i210-phy-workaround.patch queue-4.9/drm-nouveau-disp-nv50-split-chid-into-chid.ctrl-and-chid.user.patch queue-4.9/net-fix-socket-refcounting-in-skb_complete_wifi_ack.patch queue-4.9/scsi-ibmvscsis-synchronize-cmds-at-tpg_enable_store-time.patch queue-4.9/ipv6-orphan-skbs-in-reassembly-unit.patch queue-4.9/act_connmark-avoid-crashing-on-malformed-nlattrs-with-null-parms.patch queue-4.9/uvcvideo-uvc_scan_fallback-for-webcams-with-broken-chain.patch