On 2/2/21 6:26 PM, David Hildenbrand wrote: > On 02.02.21 13:51, Will Deacon wrote: >> On Tue, Feb 02, 2021 at 01:39:29PM +0100, David Hildenbrand wrote: >>> On 02.02.21 13:35, Will Deacon wrote: >>>> On Tue, Feb 02, 2021 at 12:32:15PM +0000, Will Deacon wrote: >>>>> On Tue, Feb 02, 2021 at 09:41:53AM +0530, Anshuman Khandual wrote: >>>>>> pfn_valid() validates a pfn but basically it checks for a valid struct page >>>>>> backing for that pfn. It should always return positive for memory ranges >>>>>> backed with struct page mapping. But currently pfn_valid() fails for all >>>>>> ZONE_DEVICE based memory types even though they have struct page mapping. >>>>>> >>>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without >>>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is >>>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will >>>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This >>>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs >>>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged >>>>>> into the system via memremap_pages() called from a driver, their respective >>>>>> memory sections will not have SECTION_IS_EARLY set. >>>>>> >>>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock >>>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set >>>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be >>>>>> skipped as its always going to be positive and that will be an optimization >>>>>> for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal >>>>>> hotplugged memory too will not have SECTION_IS_EARLY set for their sections >>>>>> >>>>>> Skipping memblock_is_map_memory() for all non early memory sections would >>>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its >>>>>> performance for normal hotplug memory as well. >>>>> >>>>> Hmm. Although I follow your logic, this does seem to rely on an awful lot of >>>>> assumptions to continue to hold true as the kernel evolves. In particular, >>>>> how do we ensure that early sections are always fully backed with >>>> >>>> Sorry, typo here: ^^^ should be *non-early* sections. >>> >>> It might be a good idea to have a look at generic >>> include/linux/mmzone.h:pfn_valid() >> >> The generic implementation already makes assumptions that aren't true on >> arm64, so that's why we've ended up with our own implementation. But the >> patches here put us in a position where I worry that pfn_valid() may return >> 'true' in future for cases where the underlying struct page is either >> non-existent or bogus, and debugging those failures really sucks. We had a >> raft of those back when NOMAP was introduced and I don't want to re-live >> that experience. > > Yeah, and I agree when it comes to boot mem. However, the way generic memory hotplug/memremap infrastructure (->!early sections) works does not allow for such special cases you mention and would break quite some other code if messed up. So I wouldn't worry about that part too much for now. Agreed. > >> >>> As I expressed already, long term we should really get rid of the arm64 >>> variant and rather special-case the generic one. Then we won't go out of >>> sync - just as it happened with ZONE_DEVICE handling here. >> >> Why does this have to be long term? This ZONE_DEVICE stuff could be the >> carrot on the stick :) > > Yes, I suggested to do it now, but Anshuman convinced me that doing a simple fix upfront might be cleaner --- for example when it comes to backporting :) Right. The current pfn_valid() breaks for ZONE_DEVICE memory and this fixes the problem in the present context which can be easily backported if required. Changing or rather overhauling the generic code with new configs as proposed earlier (which I am planning to work on subsequently) would definitely be an improvement for the current pfn_valid() situation in terms of maintainability but then it should not stop us from fixing the problem now.