On Thu, Jun 13, 2024 at 8:13 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > On Thu, Jun 13, 2024 at 7:00 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Thu, 13 Jun 2024 at 06:46, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > > > I've picked Linus patch and your for testing into the vfs.inode.rcu branch. > > > > Btw, if you added [patch 2/2] too, I think the exact byte counts in > > the comments are a bit misleading. > > > > The actual cacheline details will very much depend on 32-bit vs 64-bit > > builds, but also on things like the slab allocator debug settings. > > > > I indeed assumed "x86_64 production", with lines just copied from pahole. > > However, to the best of my understanding the counts are what one > should expect on a 64-bit kernel. > > That said: > > > I think the important part is to keep the d_lockref - that is often > > dirty and exclusive in the cache - away from the mostly RO parts of > > the dentry that can be shared across CPU's in the cache. > > > > So rather than talk about exact byte offsets, maybe just state that > > overall rule? > > > > I would assume the rule is pretty much well known and instead an > indicator where is what (as added in my comments) would be welcome. > > But if going that route, then perhaps: > "Make sure d_lockref does not share a cacheline with fields used by > RCU lookup, otherwise it can result in a signification throughput > reduction. You can use pahole(1) to check the layout." > [maybe a link to perfbook or something goes here?] > > Arguably a bunch of BUILD_BUG_ONs could be added to detect the overlap > (only active without whatever runtime debug messing with the layout). > So I tried to add the BUILD_BUG_ONs but I got some compilation errors immediately concerning the syntax, maybe I'm brainfarting here. I am not pasting that in case you want to take a stab yourself from scratch. I did however type up the following: /* * Note: dentries have a read-mostly area heavily used by RCU (denoted with * "RCU lookup touched fields") which must not share cachelines with a * frequently written-to field like d_lockref. * * If you are modifying the layout you can check with pahole(1) that there is * no overlap. */ could land just above the struct. That's my $0,03. I am not going to give further commentary on the matter, you guys touch it up however you see fit. -- Mateusz Guzik <mjguzik gmail.com>