On 19/12/24 1:29 pm, Dev Jain wrote:
On 19/12/24 9:10 am, John Hubbard wrote:
On 12/18/24 1:34 AM, Dev Jain wrote:
On 18/12/24 1:06 pm, Ryan Roberts wrote:
On 16/12/2024 16:51, Dev Jain wrote:
We may hit a situation wherein we have a larger folio mapped. It
is incorrect
to go ahead with the collapse since some pages will be unmapped,
leading to
the entire folio getting unmapped. Therefore, skip the
corresponding range.
...
It would be good if you can spell out the desired policy when
khugepaged hits
partially unmapped large folios and unaligned large folios. I think
the simple
approach is to always collapse them to fully mapped, aligned folios
even if the
resulting order is smaller than the original. But I'm not sure
that's definitely
going to always be the best thing.
Regardless, I'm struggling to understand the logic in this patch.
Taking the
order of a folio based on having hit one of it's pages says
anything about
whether the whole of that folio is mapped or not or it's alignment.
And it's not
clear to me how we would get to a situation where we are scanning
for a lower
order and find a (fully mapped, aligned) folio of higher order in
the first place.
Let's assume the desired policy is that khugepaged should always
collapse to
naturally aligned large folios. If there happens to be an existing
aligned
order-4 folio that is fully mapped, we will identify that for
collapse as part
of the scan for order-4. At that point, we should just notice that
it is already
an aligned order-4 folio and bypass collapse. Of course we may have
already
chosen to collapse it into a higher order, but we should definitely
not get to a
lower order before we notice it.
Hmm... I guess if the sysfs thp settings have been changed then
things could get
spicy... if order-8 was previously enabled and we have an order-8
folio, then it
get's disabled and khugepaged is scanning for order-4 (which is
still enabled)
then hits the order-8; what's the expected policy? rework into 2
order-4 folios
or leave it as as single order-8?
Exactly, sorry, I should have made it clear in the patch description
that I am
handling the following scenario: there is a long running system on
which we are
using order-8 folios, and now we decide to downgrade to order-4.
Will it be a
good idea to take the pain of splitting order-8 to 16 order-4
folios? This should
be a rare situation in the first place, so I have currently decided
to ignore the
folios set up by the previous sysfs setting and only focus on
collapsing fresh memory.
Thinking again, a sys-admin deciding to downgrade order of folios,
should do that in
the hopes of reducing internal fragmentation or increasing swap
speed etc, so it makes
sense to shatter large folios....maybe we can have a sysfs tunable
for this?
Maybe we should not support it (at runtime) at all. We are trying to
build
systems that don't require incredibly detailed sysadmin involvement, and
this level of tweaking qualifies, thoroughly, as "incredibly detailed
sysadmin micromanagement", imho.
Ryan pointed out one thing: what about unaligned, or partially mapped
large
folios? For the previous sysfs settings, it may happen that we have an
unaligned
order-8 folio, let us say it got unaligned due to mremap(). Then it is
a good
idea to start from the order-4 aligned page and start collapsing
memory so
that we can take advantage of the contig bit. Otherwise if it is a
fully-mapped
aligned order-8 folio, then we anyways are abusing the contig bit
advantage
so collapsing is pointless.
In fact, in the current code, we are collapsing an unaligned PMD-size
folio to an
aligned PMD-mapped folio; we will not see a block mapping in the PMD, and go
ahead with the scan...so the logic should be, skip the scan if the VAs
and PAs are
aligned.
Apologies for not having gone through the series in detail yet, but this
point jumped out at me.
thanks,