Re: [PATCH] bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 7 Jan 2022, Coly Li wrote:
> On 1/7/22 4:21 PM, mingzhe.zou@xxxxxxxxxxxx wrote:
> > From: Zou Mingzhe <mingzhe.zou@xxxxxxxxxxxx>
> >
> > When attaching a cached device (a.k.a backing device) to a cache
> > device, bch_sectors_dirty_init() is called to count dirty sectors
> > and stripes (see what bcache_dev_sectors_dirty_add() does) on the
> > cache device.
> >
> > When bcache_dev_sectors_dirty_add() is called, set_bit(stripe,
> > d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes)
> > operation will always be performed. In full_dirty_stripes, each 1bit
> > represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512),
> > and each CPU cache line=64B=512bit=2048MB. When 20 threads process
> > a cached disk with 100G dirty data, a single thread processes about
> > 23M at a time, and 20 threads total 460M. These full_dirty_stripes
> > bits corresponding to the 460M data is likely to fall in the same CPU
> > cache line. When one of these threads performs a set_bit or clear_bit
> > operation, the same CPU cache line of other threads will become invalid
> > and must read the full_dirty_stripes from the main memory again. Compared
> > with single thread, the time of a bcache_dev_sectors_dirty_add()
> > call is increased by about 50 times in our test (100G dirty data,
> > 20 threads, bcache_dev_sectors_dirty_add() is called more than
> > 20 million times).
> >
> > This patch tries to test_bit before set_bit or clear_bit operation.
> > Therefore, a lot of force set and clear operations will be avoided,
> > and most of bcache_dev_sectors_dirty_add() calls will only read CPU
> > cache line.

Hi Mingzhe, good catch on this.  I'm curious: how much time does it save 
attaching a cache?

-Eric

> >
> > Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
> 
> Added into my testing queue. Thanks.
> 
> Coly Li
> 
> > ---
> >   drivers/md/bcache/writeback.c | 11 +++++++----
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 68e75c692dd4..4afe22875d4f 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -596,10 +596,13 @@ void bcache_dev_sectors_dirty_add(struct cache_set *c,
> > unsigned int inode,
> >   
> >     sectors_dirty = atomic_add_return(s,
> >   					d->stripe_sectors_dirty + stripe);
> > -		if (sectors_dirty == d->stripe_size)
> > -			set_bit(stripe, d->full_dirty_stripes);
> > -		else
> > -			clear_bit(stripe, d->full_dirty_stripes);
> > +		if (sectors_dirty == d->stripe_size) {
> > +			if (!test_bit(stripe, d->full_dirty_stripes))
> > +				set_bit(stripe, d->full_dirty_stripes);
> > +		} else {
> > +			if (test_bit(stripe, d->full_dirty_stripes))
> > +				clear_bit(stripe, d->full_dirty_stripes);
> > +		}
> >   
> >     nr_sectors -= s;
> >     stripe_offset = 0;
> 
> 
> 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux