Re: [PATCH] ext4: add an interface to load block bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 23, 2018, at 9:30 AM, Jan Kara <jack@xxxxxxx> wrote:
> 
> On Wed 16-05-18 12:12:00, Andreas Dilger wrote:
>> On May 15, 2018, at 8:58 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
>>> 
>>> On Tue, May 15, 2018 at 10:06:23PM +0900, Wang Shilong wrote:
>>>> From: Wang Shilong <wshilong@xxxxxxx>
>>>> 
>>>> During our benchmarking, we found sometimes writing
>>>> performances are not stable enough and there are some
>>>> small read during write which could drop throughput(~30%).
>>> 
>>> Out of curiosity, what sort of benchmarks are you doing?
>> 
>> This is related to Lustre, though I can't comment on the specific
>> benchmarks.  We've had a variety of reports about this issue. The
>> current workaround (hack) is "dumpe2fs /dev/XXX > /dev/null" at
>> mount time and periodically at runtime via cron to keep the bitmaps
>> loaded, but we wanted to get a better solution in place that works
>> for everyone.  Being able to load all of the bitmaps at mount time
>> also helps get the server performance "up to speed" quickly, rather
>> than on-demand loading of a few hundred MB of bitmaps over time.
> 
> So what I don't like on the explicit pinning is that only a few admins will
> be able to get it right.

Sure, I agree it would be better to handle this more automatically.

>>>> It turned out that block bitmaps loading could make
>>>> some latency here,also for a heavy fragmented filesystem,
>>>> we might need load many bitmaps to find some free blocks.
>>>> 
>>>> To improve above situation, we had a patch to load block
>>>> bitmaps to memory and pin those bitmaps memory until umount
>>>> or we release the memory on purpose, this could stable write
>>>> performances and improve performances of a heavy fragmented
>>>> filesystem.
>>> 
>>> This is true, but I wonder how realistic this is on real production
>>> systems.  For a 1 TiB file system, pinning all of the block bitmaps
>>> will require 32 megabytes of memory.  Is that really realistic for
>>> your use case?
>> 
>> In the case of Lustre servers, they typically have 128GB of RAM or
>> more, and are serving a few hundred TB of storage each these days,
>> but do not have any other local users, so the ~6GB RAM usage isn't a
>> huge problem if it improves the performance/consistency. The real
>> issue is that the bitmaps do not get referenced often enough compared
>> to the 10GB/s of data flowing through the server, so they are pushed
>> out of memory too quickly.
> 
> OK, and that 10GB/s is mostly use once data?

Pretty much, yes.  There isn't much chance to re-use the data when it
can only fit into RAM for a few seconds.

> So I could imagine we cache free block information in a more efficient format
> (something like you or Ted describe), provide a proper shrinker for it
> (possibly biasing it to make reclaim less likely), and enable it always...
> That way users don't have to configure it and we don't have to be afraid of
> eating too much memory in expense of something else.

One of the problems with this approach is that having compressed bitmaps
wouldn't necessarily help the performance issue.  This would allow the
initial block allocation to proceed without reading the bitmap from disk,
but then the bitmap still needs to be updated and written to disk in that
transaction.

I guess one possibility is to reconstruct a full bitmap from the compressed
in-memory bitmap for the write, but this also carries some risk if there
is an error - the reconstructed bitmap is much more likely to have a large
corruption because it is generated from a compressed version, compared to
a (likely) small corruption in the full bitmap.

Of course, the other question is the complexity of implementing this.
Pinning the bitmaps is a trivial change that can be applied to a wide range
of kernel versions, while adding compressed bitmaps will add a lot more
code and complexity.  I'm not against that, but it would take longer to do.

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux