Re: [PATCH v2] bcache: avoid oversize memory allocation by small stripe_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 28 Sep 2023, Coly Li wrote:

> Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
> used for dirty data writeback, their sizes are decided by backing device
> capacity and stripe size. Larger backing device capacity or smaller
> stripe size make these two arraies occupies more dynamic memory space.
> 
> Currently bcache->stripe_size is directly inherited from
> queue->limits.io_opt of underlying storage device. For normal hard
> drives, its limits.io_opt is 0, and bcache sets the corresponding
> stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
> devices do declare value for queue->limits.io_opt, small stripe_size
> (comparing to 1TB) becomes an issue for oversize memory allocations of
> bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
> capacity of hard drives gets much larger in recent decade.
> 
> For example a raid5 array assembled by three 20TB hardrives, the raid
> device capacity is 40TB with typical 512KB limits.io_opt. After the math
> calculation in bcache code, these two arraies will occupy 400MB dynamic
> memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
> declared on a new 2TB hard drive, then these two arraies request 2GB and
> 512MB dynamic memory from kzalloc(). The result is that bcache device
> always fails to initialize on his system.
> 
> To avoid the oversize memory allocation, bcache->stripe_size should not
> directly inherited by queue->limits.io_opt from the underlying device.
> This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
> and set bcache device's stripe size against the declared limits.io_opt
> value from the underlying storage device,
> - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
>   set its stripe size directly by this limits.io_opt value.
> - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
>   set its stripe size by a value multiplying limits.io_opt and euqal or
>   large than BCH_MIN_STRIPE_SZ.
> 
> Then the minimal stripe size of a bcache device will always be >= 4MB.
> For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
> bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
> in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
> by these two arraies will be 2.5MB in total.
> 
> Such mount of memory allocated for bcache->stripe_sectors_dirty and
> bcache->full_dirty_stripes is reasonable for most of storage devices.
> 
> Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@xxxxxxxx>
> Signed-off-by: Coly Li <colyli@xxxxxxx>
> Cc: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>
> ---
> Changelog:
> v2: use roundup() to replace round_up() as Eric Wheeler suggested.
> v1: the initial version.
> 
>  drivers/md/bcache/bcache.h | 1 +
>  drivers/md/bcache/super.c  | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 5a79bb3c272f..83eb7f27db3d 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -265,6 +265,7 @@ struct bcache_device {
>  #define BCACHE_DEV_WB_RUNNING		3
>  #define BCACHE_DEV_RATE_DW_RUNNING	4
>  	int			nr_stripes;
> +#define BCH_MIN_STRIPE_SZ		((4 << 20) >> SECTOR_SHIFT)
>  	unsigned int		stripe_size;
>  	atomic_t		*stripe_sectors_dirty;
>  	unsigned long		*full_dirty_stripes;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 0ae2b3676293..0eb71543d773 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -905,6 +905,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
>  
>  	if (!d->stripe_size)
>  		d->stripe_size = 1 << 31;
> +	else if (d->stripe_size < BCH_MIN_STRIPE_SZ)
> +		d->stripe_size = roundup(BCH_MIN_STRIPE_SZ, d->stripe_size);
>  
>  	n = DIV_ROUND_UP_ULL(sectors, d->stripe_size);
>  	if (!n || n > max_stripes) {
> -- 
> 2.35.3
> 

Thanks Coly.  

Reviewed-by: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>


--
Eric Wheeler

> 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux