Re: [PATCH 5/8] mm: shmem: add multi-size THP sysfs interface for anonymous shmem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08.05.24 14:02, David Hildenbrand wrote:
On 08.05.24 11:02, Ryan Roberts wrote:
On 08/05/2024 08:12, David Hildenbrand wrote:
On 08.05.24 09:08, David Hildenbrand wrote:
On 08.05.24 06:45, Baolin Wang wrote:


On 2024/5/7 18:52, Ryan Roberts wrote:
On 06/05/2024 09:46, Baolin Wang wrote:
To support the use of mTHP with anonymous shmem, add a new sysfs interface
'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
directory for each mTHP to control whether shmem is enabled for that mTHP,
with a value similar to the top level 'shmem_enabled', which can be set to:
"always", "inherit (to inherit the top level setting)", "within_size",
"advise",
"never", "deny", "force". These values follow the same semantics as the top
level, except the 'deny' is equivalent to 'never', and 'force' is equivalent
to 'always' to keep compatibility.

We decided at [1] to not allow 'force' for non-PMD-sizes.

[1]
https://lore.kernel.org/linux-mm/533f37e9-81bf-4fa2-9b72-12cdcb1edb3f@xxxxxxxxxx/

However, thinking about this a bit more, I wonder if the decision we made to
allow all hugepages-xxkB/enabled controls to take "inherit" was the wrong one.
Perhaps we should have only allowed the PMD-sized enable=inherit (this is just
for legacy back compat after all, I don't think there is any use case where
changing multiple mTHP size controls atomically is actually useful). Applying

Agree. This is also our usage of 'inherit'.

Missed that one: there might be use cases in the future once we would start
defaulting to "inherit" for all knobs (a distro might default to that) and
default-enable THP in the global knob. Then, it would be easy to disable any THP
by disabling the global knob. (I think that's the future we're heading to, where
we'd have an "auto" mode that can be set on the global toggle).

But I am just making up use cases ;) I think it will be valuable and just doing
it consistently now might be cleaner.

I agree that consistency between enabled and shmem_enabled is top priority. And
yes, I had forgotten about the glorious "auto" future. So probably continuing
all sizes to select "inherit" is best.

But for shmem_enabled, that means we need the following error checking:

   - It is an error to set "force" for any size except PMD-size

   - It is an error to set "force" for the global control if any size except PMD-
     size is set to "inherit"

   - It is an error to set "inherit" for any size except PMD-size if the global
     control is set to "force".

Certainly not too difficult to code and prove to be correct, but not the nicest
UX from the user's point of view when they start seeing errors.

I think we previously said this would likely be temporary, and if/when tmpfs
gets mTHP support, we could simplify and allow all sizes to be set to "force".
But I wonder if tmpfs would ever need explicit mTHP control? Maybe it would be
more suited to the approach the page cache takes to transparently ramp up the
folio size as it faults more in. (Just saying there is a chance that this error
checking becomes permanent).

Note that with shmem you're inherently facing the same memory waste
issues etc as you would with anonymous memory. (sometimes even worse, if
you're running shmem that's configured to be unswappable!).

Also noting that memory waste is not really a problem when a write to a shmem file allocates a large folio that stays within boundaries of that write; issues only pop up if you end up over-allocating, especially, during page faults where you have not that much clue about what to do (single address, no real range provided).

There is the other issue that wasting large chunks of contiguous memory on stuff that barely benefits from it. With memory that maybe never gets evicted, there is no automatic "handing back" of that memory to the system to be used by something else. With ordinary files, that's a bit different. But I did not look closer into that issue yet, it's one of the reasons MADV_HUGEPAGE was added IIRC.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux