> On 26 Nov 2019, at 00:39, Andreas Dilger <adilger@xxxxxxxxx> wrote: > > I think it is important to understand what the actual goal size is at this > point. The filesystems where we are seeing problems are _huge_ (650TiB and > larger) and are relatively full (70% or more) but take tens of minutes to > finish mounting. Lustre does some small writes at mount time, but it shouldn't > take so long to find some small allocations for the config log update. > > The filesystems are automatically getting "s_stripe_size = 512" from mke2fs > (presumably from the underlying RAID), and I _think_ this is causing mballoc > to inflate the IO request to 8-16MB prealloc chunks, which would be much > harder to find, and unnecessary for a small allocation. > Yes, I agree. It makes sense to limit group preallocation in cases like this. Thanks, Alex