On Fri 28-02-25 10:00:59, Pan Deng wrote: > When running syscall pread in a high core count system, f_ref contends > with the reading of f_mode, f_op, f_mapping, f_inode, f_flags in the > same cache line. Well, but you have to have mulithreaded process using the same struct file for the IO, don't you? Otherwise f_ref is not touched... > This change places f_ref to the 3rd cache line where fields are not > updated as frequently as the 1st cache line, and the contention is > grealy reduced according to tests. In addition, the size of file > object is kept in 3 cache lines. > > This change has been tested with rocksdb benchmark readwhilewriting case > in 1 socket 64 physical core 128 logical core baremetal machine, with > build config CONFIG_RANDSTRUCT_NONE=y > Command: > ./db_bench --benchmarks="readwhilewriting" --threads $cnt --duration 60 > The throughput(ops/s) is improved up to ~21%. > ===== > thread baseline compare > 16 100% +1.3% > 32 100% +2.2% > 64 100% +7.2% > 128 100% +20.9% > > It was also tested with UnixBench: syscall, fsbuffer, fstime, > fsdisk cases that has been used for file struct layout tuning, no > regression was observed. So overall keeping the first cacheline read mostly with important stuff makes sense to limit cache traffic. But: > struct file { > - file_ref_t f_ref; > spinlock_t f_lock; > fmode_t f_mode; > const struct file_operations *f_op; > @@ -1102,6 +1101,7 @@ struct file { > unsigned int f_flags; > unsigned int f_iocb_flags; > const struct cred *f_cred; > + u8 padding[8]; > /* --- cacheline 1 boundary (64 bytes) --- */ > struct path f_path; > union { > @@ -1127,6 +1127,7 @@ struct file { > struct file_ra_state f_ra; > freeptr_t f_freeptr; > }; > + file_ref_t f_ref; > /* --- cacheline 3 boundary (192 bytes) --- */ > } __randomize_layout This keeps struct file within 3 cachelines but it actually grows it from 184 to 192 bytes (and yes, that changes how many file structs we can fit in a slab). So instead of adding 8 bytes of padding, just pick some read-mostly element and move it into the hole - f_owner looks like one possible candidate. Also did you test how moving f_ref to the second cache line instead of the third one behaves? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR