On Fri, Feb 28, 2025 at 08:51:27PM +0100, Jan Kara wrote: > On Fri 28-02-25 10:00:59, Pan Deng wrote: > > When running syscall pread in a high core count system, f_ref contends > > with the reading of f_mode, f_op, f_mapping, f_inode, f_flags in the > > same cache line. > > Well, but you have to have mulithreaded process using the same struct file > for the IO, don't you? Otherwise f_ref is not touched... Yes, it's specifically designed to scale better under high contention. > > > This change places f_ref to the 3rd cache line where fields are not > > updated as frequently as the 1st cache line, and the contention is > > grealy reduced according to tests. In addition, the size of file > > object is kept in 3 cache lines. > > > > This change has been tested with rocksdb benchmark readwhilewriting case > > in 1 socket 64 physical core 128 logical core baremetal machine, with > > build config CONFIG_RANDSTRUCT_NONE=y > > Command: > > ./db_bench --benchmarks="readwhilewriting" --threads $cnt --duration 60 > > The throughput(ops/s) is improved up to ~21%. > > ===== > > thread baseline compare > > 16 100% +1.3% > > 32 100% +2.2% > > 64 100% +7.2% > > 128 100% +20.9% > > > > It was also tested with UnixBench: syscall, fsbuffer, fstime, > > fsdisk cases that has been used for file struct layout tuning, no > > regression was observed. > > So overall keeping the first cacheline read mostly with important stuff > makes sense to limit cache traffic. But: > > > struct file { > > - file_ref_t f_ref; > > spinlock_t f_lock; > > fmode_t f_mode; > > const struct file_operations *f_op; > > @@ -1102,6 +1101,7 @@ struct file { > > unsigned int f_flags; > > unsigned int f_iocb_flags; > > const struct cred *f_cred; > > + u8 padding[8]; > > /* --- cacheline 1 boundary (64 bytes) --- */ > > struct path f_path; > > union { > > @@ -1127,6 +1127,7 @@ struct file { > > struct file_ra_state f_ra; > > freeptr_t f_freeptr; > > }; > > + file_ref_t f_ref; > > /* --- cacheline 3 boundary (192 bytes) --- */ > > } __randomize_layout > > This keeps struct file within 3 cachelines but it actually grows it from > 184 to 192 bytes (and yes, that changes how many file structs we can fit in > a slab). So instead of adding 8 bytes of padding, just pick some > read-mostly element and move it into the hole - f_owner looks like one > possible candidate. This is what I did. See vfs-6.15.misc! Thanks! > > Also did you test how moving f_ref to the second cache line instead of the > third one behaves? > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR