On Wed, Jun 14, 2023 at 10:43:30PM -0700, Hugh Dickins wrote:
On Wed, 14 Jun 2023, Hugh Dickins wrote:
On Wed, 14 Jun 2023, Nathan Chancellor wrote:
I just bisected a crash while powering down a MIPS machine in QEMU to
this change as commit 8044511d3893 ("mips: update_mmu_cache() can
replace __update_tlb()") in linux-next.
Thank you, Nathan, that's very helpful indeed. This patch certainly knew
that it wanted testing, and I'm glad to hear that it is now seeing some.
While powering down? The messages below look like it was just coming up,
but no doubt that's because you were bisecting (or because I'm unfamiliar
with what messages to expect there). It's probably irrelevant information,
but I wonder whether the (V)machine worked well enough for a while before
you first powered down and spotted the problem, or whether it's never got
much further than trying to run init (busybox)? I'm trying to get a feel
for whether the problem occurs under common or uncommon conditions.
Ugh sorry, I have been looking into too many bugs lately and got my
wires crossed :) this is indeed a problem when running init (which is
busybox, this is a simple Buildroot file system).
Unfortunately, I can still
reproduce it with the existing fix you have for this change on the
mailing list, which is present in next-20230614.
Right, that later fix was only for a build warning, nothing functional
(or at least I hoped that it wasn't making any functional difference).
Thanks a lot for the detailed instructions below: unfortunately, those
would draw me into a realm of testing I've never needed to enter before,
so a lot of time spent on setup and learning. Usually, I just stare at
the source.
What this probably says is that I should revert most my cleanup there,
and keep as close to the existing code as possible. But some change is
needed, and I may need to understand (or have a good guess at) what was
going wrong, to decide what kind of retreat will be successful.
Back to the source for a while: I hope I'll find examples in nearby MIPS
kernel source (and git history), which will hint at the right way forward.
Then send you a patch against next-20230614 to try, when I'm reasonably
confident that it's enough to satisfy my purpose, but likely not to waste
your time.
I'm going to take advantage of your good nature by attaching
two alternative patches, either to go on top of next-20230614.
mips1.patch,
arch/mips/mm/tlb-r4k.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
is by far my favourite. I couldn't see anything wrong with what's
already there for mips, but it seems possible that (though I didn't
find it) somewhere calls update_mmu_cache_pmd() on a page table. So
mips1.patch restores the pmd_huge() check, and cleans up further by
removing the silly pgdp, p4dp, pudp, pmdp stuff: the pointer has now
been passed in by the caller, why walk the tree again? I should have
done it this way before.
But if that doesn't work, then I'm afraid it will have to be
mips2.patch,
arch/mips/include/asm/pgtable.h | 15 ++++++++++++---
arch/mips/mm/tlb-r3k.c | 5 ++---
arch/mips/mm/tlb-r4k.c | 27 ++++++++++++++++++---------
3 files changed, 32 insertions(+), 15 deletions(-)
which reverts all of the original patch and its build warning fix,
and does a pte_unmap() to balance the silly pte_offset_map() there;
with an apologetic comment for this being about the only place in
the tree where I have no idea what to do if ptep were NULL.
I do hope that you find the first fixes the breakage; but if not, then
I hate to be the bearer of bad news but the first patch did not fix the
breakage, I see the same issue.
I even more fervently hope that the second will, despite my hating it.
Touch wood for the first, fingers crossed for the second, thanks,
Thankfully, the second one does. Thanks for the quick and thoughtful
responses!
Cheers,
Nathan