Hi Tim, On Wed, Apr 05, 2023 at 03:38:20PM -0700, Tim Chen wrote: > Thanks for the update. Looks good. Some minor nits to improve readability. Many thanks for the review! > > On Tue, 2023-04-04 at 13:22 +0800, Feng Tang wrote: > > > > + > > +False sharing could easily happen unless they are intentionally > > +checked, and it is valuable to run specific tools for performance > > +critical workloads to detect false sharing affecting performance case > > +and optimize accordingly. > > + > > + > > +How to detect and analysis False Sharing > > s/analysis/analyse Will correct it. > > +======================================== > > +perf record/report/stat are widely used for performance tuning, and > > +once hotspots are detected, tools like 'perf-c2c' and 'pahole' can > > +be further used to detect and pinpoint the possible false sharing > > +data structures. 'addr2line' is also good at decoding instruction > > +pointer when there are multiple layers of inline functions. [...] > > +Possible Mitigations > > +==================== > > +False sharing does not always need to be mitigated. False sharing > > +mitigations need to balance performance gains with complexity and > > s/need to/should/ Will change. > > +space consumption. Sometimes, lower performance is OK, and it's > > +unnecessary to hyper-optimize every rarely used data structure or > > +a cold data path. > > + > > +False sharing hurting performance cases are seen more frequently with > > +core count increasing. Because of these detrimental effects, many > > +patches have been proposed across variety of subsystems (like > > +networking and memory management) and merged. Some common mitigations > > +(with examples) are: > > + > > +* Separate hot global data in its own dedicated cache line, even if it > > + is just a 'short' type. The downside is more consumption of memory, > > + cache line and TLB entries. > > + > > + - Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated") > > + > > +* Reorganize the data structure, separate the interfering members to > > + different cache lines. One downside is it may introduce new false > > + sharing of other members. > > + > > + - Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing") > > + > > +* Replace 'write' with 'read' when possible, especially in loops. > > + Like for some global variable, use compare(read)-then-write instead > > + of unconditional write. For example, use:: > > + > > + if (!test_bit(XXX)) > > + set_bit(XXX); > > + > > + instead of directly "set_bit(XXX);", similarly for atomic_t data:: > > + > > + if (atomic_read(XXX) == AAA) > > + atomic_set(XXX, BBB); > > + > > + - Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing") > > + - Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP") > > + > > +* Turn hot global data to 'per-cpu data + global data' when possible, > > + or reasonably increase the threshold for syncing per-cpu data to > > + global data, to reduce or postpone the 'write' to that global data. > > + > > + - Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses") > > + - Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy") > > + > > +Surely, all mitigations should be carefully verified to not cause side > > +effects. And to avoid false sharing in advance during coding, it's > > Maybe say > To avoid introducing false sharing when coding Yes, this is better. > > +better to: > > + > > +* Be aware of cache line boundaries > > +* Group mostly read-only fields together > > +* Group things that are written at the same time together > > +* Separate known read-mostly and written-mostly fields > > Separate frequently read and frequently written fields on different cache lines Will change. Thanks! - Feng > > + > > +and better add a comment stating the false sharing consideration. > > + > > +One note is, sometimes even after a severe false sharing is detected > > +and solved, the performance may still have no obvious improvement as > > +the hotspot switches to a new place. > > + > > + > > +Miscellaneous > > +============= > > +One open issue is that kernel has an optional data structure > > +randomization mechanism, which also randomizes the situation of cache > > +line sharing of data members. > > + > > + > > +.. [1] https://en.wikipedia.org/wiki/False_sharing > > +.. [2] https://lore.kernel.org/lkml/CAHk-=whoqV=cX5VC80mmR9rr+Z+yQ6fiQZm36Fb-izsanHg23w@xxxxxxxxxxxxxx/ > > +.. [3] https://joemario.github.io/blog/2016/09/01/c2c-blog/ > > diff --git a/Documentation/kernel-hacking/index.rst b/Documentation/kernel-hacking/index.rst > > index f53027652290..79c03bac99a2 100644 > > --- a/Documentation/kernel-hacking/index.rst > > +++ b/Documentation/kernel-hacking/index.rst > > @@ -9,3 +9,4 @@ Kernel Hacking Guides > > > > hacking > > locking > > + false-sharing > > Thanks. > > Tim >