On Fri, Mar 24, 2023 at 03:13:16PM +0800, Feng Tang wrote: > +There are many real-world cases of performance regressions caused by > +false sharing, and one is a rw_semaphore 'mmap_lock' inside struct "... . One of these is rw_semaphore 'mmap_lock' ..." But I think in English we commonly name things as "foobar struct" instead of "struct foobar" (that is, common noun follow the proper noun that names something). > +* A global datum accessed (shared) by many CPUs Global data? > +Following 'mitigation' section provides real-world examples. "The real-world examples are given in 'Possible mitigations' sections." > + #perf c2c record -ag sleep 3 > + #perf c2c report --call-graph none -k vmlinux Are these commands really run as root? > + > +Run it when testing will-it-scale's tlb_flush1 case, and the report > +has pieces like:: "When running above during testing ..., perf reports something like::" > +False sharing hurting performance cases are seen more frequently with > +core count increasing, and there have been many patches merged to > +solve it, like in networking and memory management subsystems. Some > +common mitigations (with examples) are: "... Because of these detrimental effects, many patches have been proposed across variety of subsystems (like networking and memory management) and merged." > + > +* Separate hot global data in its own dedicated cache line, even if it > + is just a 'short' type. The downside is more consumption of memory, > + cache line and TLB entries. > + > + Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated") > + > +* Reorganize the data structure, separate the interfering members to > + different cache lines. One downside is it may introduce new false > + sharing of other members. > + > + Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing") > + > +* Replace 'write' with 'read' when possible, especially in loops. > + Like for some global variable, use compare(read)-then-write instead > + of unconditional write. For example, use: "... For example, write::" > + > + if (!test_bit(XXX)) > + set_bit(XXX); > + > + instead of directly "set_bit(XXX);", similarly for atomic_t data. > + > + Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing") > + Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP") > + > +* Turn hot global data to 'per-cpu data + global data' when possible, > + or reasonably increase the threshold for syncing per-cpu data to > + global data, to reduce or postpone the 'write' to that global data. > + > + Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses") > + Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy") IMO it's odd to jump to specifying example commits without some sort of conjuction (e.g. "for example, see commit <commit>"). > + > +Surely, all mitigations should be carefully verified to not cause side > +effects. And to avoid false sharing in advance during coding, it's > +better to: > + > +* Be aware of cache line boundaries > +* Group mostly read-only fields together > +* Group things that are written at the same time together > +* Separate known read-mostly and written-mostly fields Proactively prevent false sharing with above tips? Thanks. -- An old man doll... just what I always wanted! - Clara
Attachment:
signature.asc
Description: PGP signature