On Thu, Feb 11, 2021 at 05:40:35PM +0530, Anshuman Khandual wrote: > On 2/11/21 5:23 PM, Will Deacon wrote: > > On Fri, Feb 05, 2021 at 06:55:53PM +0000, Will Deacon wrote: > >> On Wed, Feb 03, 2021 at 09:20:39AM +0530, Anshuman Khandual wrote: > >>> On 2/2/21 6:26 PM, David Hildenbrand wrote: > >>>> On 02.02.21 13:51, Will Deacon wrote: > >>>>> On Tue, Feb 02, 2021 at 01:39:29PM +0100, David Hildenbrand wrote: > >>>>>> As I expressed already, long term we should really get rid of the arm64 > >>>>>> variant and rather special-case the generic one. Then we won't go out of > >>>>>> sync - just as it happened with ZONE_DEVICE handling here. > >>>>> > >>>>> Why does this have to be long term? This ZONE_DEVICE stuff could be the > >>>>> carrot on the stick :) > >>>> > >>>> Yes, I suggested to do it now, but Anshuman convinced me that doing a > >>>> simple fix upfront might be cleaner --- for example when it comes to > >>>> backporting :) > >>> > >>> Right. The current pfn_valid() breaks for ZONE_DEVICE memory and this fixes > >>> the problem in the present context which can be easily backported if required. > >>> > >>> Changing or rather overhauling the generic code with new configs as proposed > >>> earlier (which I am planning to work on subsequently) would definitely be an > >>> improvement for the current pfn_valid() situation in terms of maintainability > >>> but then it should not stop us from fixing the problem now. > >> > >> Alright, I've mulled this over a bit. I don't agree that this patch helps > >> with maintainability (quite the opposite, in fact), but perfection is the > >> enemy of the good so I'll queue the series for 5.12. However, I'll revert > >> the changes at the first sign of a problem, so please do work towards a > >> generic solution which can replace this in the medium term. > > > > ... and dropped. These patches appear to be responsible for a boot > > regression reported by CKI: > > Ahh, boot regression ? These patches only change the behaviour > for non boot memory only. Sure, but this thing is horribly fragile, which is why I was nervous about touching it in the first place ;) > > https://lore.kernel.org/r/cki.8D1CB60FEC.K6NJMEFQPV@xxxxxxxxxx > > Will look into the logs and see if there is something pointing to > the problem. We don't have a log yet, but I've asked whether earlycon works on the problematic machine (the failure seems to be specific to a certain TX2). Either way, this is too late for 5.12 now. Will