Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Op 21/10/2024 om 10:46 schreef Janpieter Sollie:
Op 20/10/2024 om 22:29 schreef Kent Overstreet:



I'm not going to run custom benchmarks just for a silly argument, sorry.
But on a fileserver with 128 GB of ram and a 75 TB filesystem
(yes, that's likely a dedicated fileserver),
we can quite easily justify a btree node cache of perhaps 10GB,
and on random update workloads the journal does need to be that big -
otherwise our btree node write size goes down and throughput suffers.

Is this idea based on "the user only has 1 FS per device?"
I assume I have this setup (and it probably is, looks like mine).
I have 3 bcachefs filesystems each taking 10% of RAM.
So, I end up with a memory load of 30% dedicated to bcachefs caching.
If I read your argument, you say "I want a large btree node cache,
because that's making the fs more efficient".  No doubts about that.

VFS buffering may already save you a lot of lookups you're actually
building the btree node cache for.
Theoretically, there's a large difference about how they work,
but in practice, what files will it lookup mostly?
Probably the few ones you already have in your vfs buffer.
The added value of keeping a large "metadata" cache seems doubtful.

I have my doubts about trading 15G of buffer to 15G of btree node cache:
You lose the opportunity to share those 15G ram between all filesystems.
On the other hand, when you perform many different file lookups,
it will shine with everything it has.

Maybe some tuning parameter could help here?
it will at least limit the "insane" required journal size

Janpieter Sollie
And things quickly grow out of hand here:

a bcachefs report on fs usage:

blablabla (other disks)

A device dedicated to metadata:

SSDM (device 6):                sdg1              rw
                               data         buckets    fragmented
 free:                  39254491136          149744
 sb:                        3149824              13        258048
 journal:                 312475648            1192
 btree:                   428605440            1635
 user:                            0               0
 cached:                          0               0
 parity:                          0               0
 stripe:                          0               0
 need_gc_gens:                    0               0
 need_discard:                    0               0
 unstriped:                       0               0
 capacity:              39998980096          152584

Oops ... the journal size is more than 70% of the fs data!

Janpieter Sollie

Attachment: OpenPGP_0xB1548C450F1F925B.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux