On Thu, 14 Mar 2024 at 00:18, Steve Wahl <steve.wahl@xxxxxxx> wrote: > > On Wed, Mar 13, 2024 at 07:16:23AM -0500, Eric W. Biederman wrote: > > > > Kexec happens on identity mapped page tables. > > > > The files of interest are machine_kexec_64.c and relocate_kernel_64.S > > > > I suspect either the building of the identity mappged page table in > > machine_kexec_prepare, or the switching to the page table in > > identity_mapped in relocate_kernel_64.S is where something goes wrong. > > > > Probably in kernel_ident_mapping_init as that code is directly used > > to build the identity mapped page tables. > > > > Hmm. > > > > Your change is commit d794734c9bbf ("x86/mm/ident_map: Use gbpages only > > where full GB page should be mapped.") > > Yeah, sorry, I accidentally used the stable cherry-pick commit id that > Pavin Joseph found with his bisect results. > > > Given the simplicity of that change itself my guess is that somewhere in > > the first 1Gb there are pages that needed to be mapped like the idt at 0 > > that are not getting mapped. > > ... > > > It might be worth setting up early printk on some of these systems > > and seeing if the failure is in early boot up of the new kernel (that is > > using kexec supplied identity mapped pages) rather than in kexec per-se. > > > > But that is just my guess at the moment. > > Thanks for the input. I was thinking in terms of running out of > memory somewhere because we're using more page table entries than we > used to. But you've got me thinking that maybe some necessary region > is not explicitly requested to be placed in the identity map, but is > by luck included in the rounding errors when we use gbpages. Yes, it is possible. Here is an example case: http://lists.infradead.org/pipermail/kexec/2023-June/027301.html Final change was to avoid doing AMD things on Intel platform, but the mapping code is still not fixed in a good way. > > At any rate, since I am still unable to reproduce this for myself, I > am going to contact Pavin Joseph off-list and see if he's willing to > do a few debugging kernel steps for me and send me the results, to see > if I can get this figured out. (I believe trimming the CC list and/or > going private is usually frowned upon for the LKML, but I think this > is appropriate as it only adds noise for the rest. Let me know if I'm > wrong.) > > Thank you. > > --> Steve Wahl > > -- > Steve Wahl, Hewlett Packard Enterprise > > _______________________________________________ > kexec mailing list > kexec@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/kexec >