On Thu, 28 Sep 2023, Coly Li wrote: > Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are > used for dirty data writeback, their sizes are decided by backing device > capacity and stripe size. Larger backing device capacity or smaller > stripe size make these two arraies occupies more dynamic memory space. > > Currently bcache->stripe_size is directly inherited from > queue->limits.io_opt of underlying storage device. For normal hard > drives, its limits.io_opt is 0, and bcache sets the corresponding > stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for > devices do declare value for queue->limits.io_opt, small stripe_size > (comparing to 1TB) becomes an issue for oversize memory allocations of > bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the > capacity of hard drives gets much larger in recent decade. > > For example a raid5 array assembled by three 20TB hardrives, the raid > device capacity is 40TB with typical 512KB limits.io_opt. After the math > calculation in bcache code, these two arraies will occupy 400MB dynamic > memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is > declared on a new 2TB hard drive, then these two arraies request 2GB and > 512MB dynamic memory from kzalloc(). The result is that bcache device > always fails to initialize on his system. > > To avoid the oversize memory allocation, bcache->stripe_size should not > directly inherited by queue->limits.io_opt from the underlying device. > This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size > and set bcache device's stripe size against the declared limits.io_opt > value from the underlying storage device, > - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will > set its stripe size directly by this limits.io_opt value. > - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will > set its stripe size by a value multiplying limits.io_opt and euqal or > large than BCH_MIN_STRIPE_SZ. > > Then the minimal stripe size of a bcache device will always be >= 4MB. > For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by > bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB > in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied > by these two arraies will be 2.5MB in total. > > Such mount of memory allocated for bcache->stripe_sectors_dirty and > bcache->full_dirty_stripes is reasonable for most of storage devices. > > Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx> > Signed-off-by: Coly Li <colyli@xxxxxxx> > Cc: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> > --- > Changelog: > v2: use roundup() to replace round_up() as Eric Wheeler suggested. > v1: the initial version. > > drivers/md/bcache/bcache.h | 1 + > drivers/md/bcache/super.c | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h > index 5a79bb3c272f..83eb7f27db3d 100644 > --- a/drivers/md/bcache/bcache.h > +++ b/drivers/md/bcache/bcache.h > @@ -265,6 +265,7 @@ struct bcache_device { > #define BCACHE_DEV_WB_RUNNING 3 > #define BCACHE_DEV_RATE_DW_RUNNING 4 > int nr_stripes; > +#define BCH_MIN_STRIPE_SZ ((4 << 20) >> SECTOR_SHIFT) > unsigned int stripe_size; > atomic_t *stripe_sectors_dirty; > unsigned long *full_dirty_stripes; > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index 0ae2b3676293..0eb71543d773 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -905,6 +905,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size, > > if (!d->stripe_size) > d->stripe_size = 1 << 31; > + else if (d->stripe_size < BCH_MIN_STRIPE_SZ) > + d->stripe_size = roundup(BCH_MIN_STRIPE_SZ, d->stripe_size); > > n = DIV_ROUND_UP_ULL(sectors, d->stripe_size); > if (!n || n > max_stripes) { > -- > 2.35.3 > Thanks Coly. Reviewed-by: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> -- Eric Wheeler >