Re: [RFC] improve malloc for large filesystems

Alex Zhuravlev <azhuravlev@xxxxxxxxxxxxx> · Mon, 2 Dec 2019 08:46:47 +0000



> On 26 Nov 2019, at 00:39, Andreas Dilger <adilger@xxxxxxxxx> wrote:
> 
> I think it is important to understand what the actual goal size is at this
> point.  The filesystems where we are seeing problems are _huge_ (650TiB and
> larger) and are relatively full (70% or more) but take tens of minutes to
> finish mounting.  Lustre does some small writes at mount time, but it shouldn't
> take so long to find some small allocations for the config log update.
> 
> The filesystems are automatically getting "s_stripe_size = 512" from mke2fs
> (presumably from the underlying RAID), and I _think_ this is causing mballoc
> to inflate the IO request to 8-16MB prealloc chunks, which would be much
> harder to find, and unnecessary for a small allocation.
> 
Yes, I agree. It makes sense to limit group preallocation in cases like this.

Thanks, Alex