On 27.10.16 17:01:36, Will Deacon wrote: > Hi Robert, > > On Mon, Oct 17, 2016 at 08:58:01PM +0200, Robert Richter wrote: > > Mark, Will, any opinion here? > > Having looking at this, I'm inclined to agree with you; pfn_valid() is > all about whether the underlying mem_map (struct page *) entry exists, > not about whether the page is mappable or not. > > That said, setting the zone for pages representing NOMAP memory feels > like a slippery slope to losing information about them being NOMAP in > the first place and the whole problem getting out-of-hand. Whilst I'm > happy for pfn_valid() to return true (in the sense that we're within > bounds of mem_map etc), I'm less happy that we're also saying that the > struct page contains useful information, such as the zone and the node > information, which is then subsequently used by the NUMA code. Let's see it in a different way, pfns and the struct page assigned to each of it is about *physical* memory. The system knows all the memory, some is free, some reserved and some marked NOMAP. Regardless of the mapping of the page the mm code maintains and uses that information. There are assumptions on validity and checks in the code that now cause problems due to partly or non-existing data about nomap pages. This inconsistency is dangerous since a problem may occur any time then the page area is accessed first, thus a system may crash randomly depending on the memory access. Luckily, in my case it triggered reproducible while loading initrd during boot. I also think that this is not only NUMA related. E.g. the following bug report is probably also related: https://bugzilla.redhat.com/show_bug.cgi?id=1387793 > On top of that, pfn_valid is used in other places as a coarse "is this > memory?" check, and will cause things like ioremap to fail whereas it > wouldn't at the moment. IMO this is a misuse of pfn_valid() that needs to be fixed with additional checks, e.g. traversing memblocks. > It feels to me like NOMAP memory is a new type > of memory where there *is* a struct page, but it shouldn't be used for > anything. IMO, a NOMAP page should just be handled like a reserved page except that the page is marked reserved. See free_low_memory_core_early(). Thus, NOMAP pages are not in the free pages list or set to reserved. It is simply not available for mapping at all. Isn't that exactly what it should be? I also did not yet understand the benefit of the differentiation between NOMAP and reserved and the original motivation for its implementation. I looked through the mail threads but could not find any hint. The only difference I see now is that it is not listed as a reserved page, but as long as it is not freed it should behave the same. I remember the case to handle memory different (coherency, etc.), but are not sure here. Ard, could you explain this? > I don't think pfn_valid can describe that, given the way it's > currently used, and flipping the logic is just likely to move the problem > elsewhere. > > What options do we have for fixing this in the NUMA code? Out of my mind: 1) Treat NOMAP pages same as reserved pages (my patch). 2) Change mm code to allow arch specific early_pfn_valid(). 3) Fix mm code to only access stuct page (of a zone) if pfn_valid() is true. There can be more alternatives. IMO: * We shouldn't touch generic mm code. * We should maintain a valid struct page for all pages in a sections. * We should only traverse memblock where really necessary (arm64 only). * I don't think this problem is numa specific. -Robert -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html