On Fri, 7 Jan 2022, Coly Li wrote: > On 1/7/22 4:21 PM, mingzhe.zou@xxxxxxxxxxxx wrote: > > From: Zou Mingzhe <mingzhe.zou@xxxxxxxxxxxx> > > > > When attaching a cached device (a.k.a backing device) to a cache > > device, bch_sectors_dirty_init() is called to count dirty sectors > > and stripes (see what bcache_dev_sectors_dirty_add() does) on the > > cache device. > > > > When bcache_dev_sectors_dirty_add() is called, set_bit(stripe, > > d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes) > > operation will always be performed. In full_dirty_stripes, each 1bit > > represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512), > > and each CPU cache line=64B=512bit=2048MB. When 20 threads process > > a cached disk with 100G dirty data, a single thread processes about > > 23M at a time, and 20 threads total 460M. These full_dirty_stripes > > bits corresponding to the 460M data is likely to fall in the same CPU > > cache line. When one of these threads performs a set_bit or clear_bit > > operation, the same CPU cache line of other threads will become invalid > > and must read the full_dirty_stripes from the main memory again. Compared > > with single thread, the time of a bcache_dev_sectors_dirty_add() > > call is increased by about 50 times in our test (100G dirty data, > > 20 threads, bcache_dev_sectors_dirty_add() is called more than > > 20 million times). > > > > This patch tries to test_bit before set_bit or clear_bit operation. > > Therefore, a lot of force set and clear operations will be avoided, > > and most of bcache_dev_sectors_dirty_add() calls will only read CPU > > cache line. Hi Mingzhe, good catch on this. I'm curious: how much time does it save attaching a cache? -Eric > > > > Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx> > > Added into my testing queue. Thanks. > > Coly Li > > > --- > > drivers/md/bcache/writeback.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c > > index 68e75c692dd4..4afe22875d4f 100644 > > --- a/drivers/md/bcache/writeback.c > > +++ b/drivers/md/bcache/writeback.c > > @@ -596,10 +596,13 @@ void bcache_dev_sectors_dirty_add(struct cache_set *c, > > unsigned int inode, > > > > sectors_dirty = atomic_add_return(s, > > d->stripe_sectors_dirty + stripe); > > - if (sectors_dirty == d->stripe_size) > > - set_bit(stripe, d->full_dirty_stripes); > > - else > > - clear_bit(stripe, d->full_dirty_stripes); > > + if (sectors_dirty == d->stripe_size) { > > + if (!test_bit(stripe, d->full_dirty_stripes)) > > + set_bit(stripe, d->full_dirty_stripes); > > + } else { > > + if (test_bit(stripe, d->full_dirty_stripes)) > > + clear_bit(stripe, d->full_dirty_stripes); > > + } > > > > nr_sectors -= s; > > stripe_offset = 0; > > >