Re: + mm-introduce-reported-pages.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I definitely do not intent to nack this work, I just have maintainability
concerns and considering there is an alternative approach that does not
require to touch page allocator internals and which we need to compare
against then I do not really think there is any need to push something
in right away. Or is there any pressing reason to have this merged right
now?

The alternative approach doesn't touch the page allocator, however it
still has essentially the same changes to __free_one_page. I suspect the
performance issue seen is mostly due to the fact that because it doesn't
touch the page allocator it is taking the zone lock and probing the page
for each set bit to see if the page is still free. As such the performance
regression seen gets worse the lower the order used for reporting.


What confused me quite a lot is that this is enabled at compile time
and then incurs a performance hit whether there is a hypervisor that
even cares is involved or not. So I don't think the performance angle
justifying this approach is a good one because this implementation has
issues of its own. Previously I said

	I worry that poking too much into the internal state of the
	allocator will be fragile long-term. There is the arch alloc/free
	hooks but they are typically about protections only and does not
	interfere with the internal state of the allocator. Compaction
	pokes in as well but once the page is off the free list, the page
	allocator no longer cares so again there is on interference with
	the internal state. If the state is interefered with externally,
	it becomes unclear what happens if things like page merging is
	deferred in a way the allocator cannot control as high-order
	allocation requests may fail for example.

Adding an API for compaction does not get away from the problem that
it'll be fragile to depend on the internal state of the allocator for
correctness. Existing users that poke into the state do so as an
optimistic shortcut but if it fails, nothing actually breaks. The free
list reporting stuff might and will not be routinely tested.

Take compaction as an example, the initial implementation of it was dumb
as rocks and only started maintaining additional state and later poking
into the page allocator when there was empirical evidence it was necessary.

The initial implementation of page reporting should be the same, it
should do no harm at all to users that don't care (hiding behind
kconfig is not good enough, use static branches) and it should not
depend on the internal state of the page allocator and ordering of free
lists for correctness until it's shown it's absolutely necessary.

You say that the zone lock has to be taken in the alternative
implementation to check if it's still free and sure, that would cost
but unless you are isolating that page immediately then you are racing
once the lock is released. If you are isolating immediately, then isolate
pages in batches to amortise the loock costs.  The details of this could
be really hard but this approach is essentially saying "everything,
everywhere should take a small hit so the overhead is not noticeable for
virtio users" which is a curious choice for a new feature.

Regardless of the details of any implementation, the first one should be
basic, do no harm and be relatively simple giving just a bare interface
to virtio/qemu/etc. Then optimise it until such point as there is no
chance but to poke into the core.

I second that. If somebody would ask me, I'd want to see a simple, maintainable design that provides a net benefit, does not harm !virt and possibly reuses existing core functionality (e.g., page isolation). We can work from there to optimize.

--

Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux