On Mon, 18 Jul 2022, Liam Howlett wrote: > * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [220718 08:56]: > > * Hugh Dickins <hughd@xxxxxxxxxx> [220718 00:28]: > > > On Mon, 18 Jul 2022, Liam Howlett wrote: > > > > * Hugh Dickins <hughd@xxxxxxxxxx> [220717 16:58]: > > > > > On Fri, 15 Jul 2022, Liam Howlett wrote: > > > > > > > > > > > > Please find attached the last outstanding fix for this series. It > > > > > > covers a rare failure scenario that one of the build bots reported. > > > > > > Ironically, it changes __vma_adjust() more than the rest of the series, > > > > > > but leaves the locking in the existing order. > > > > > > > > > > Thanks, I folded that in to my testing on next-20220715, along with > > > > > other you posted on Friday (mas_dead_leaves() walk fix); > > > > > > > > Please drop that patch, it needs more testing. > > > > > > Drop the mas_dead_leaves() walk fix, or the __vma_adjust() changes > > > which you attached in that mail to me? please let me know a.s.a.p, > > > since I cannot proceed without knowing which. > > > > Drop the mas_dead_leaves() walk fix please. I responded to the patch > > that it's not tested enough. I'll respond to the rest of this email > > soon. Your mail situation gets even stranger. You sent the mas_dead_leaves() walk fix on 12th July, followed quickly by that response that it's not tested enough, so I ignored it completely then. But you sent the same fix (I've only compared visually, it looks the same) again on 15th July, so then I took it in. I wonder whether Oracle's mail server decided to send out a repeat of that patch in place of the missing all-important stale data copy one! > > So the mas_dead_leaves() patch will most likely produce memory leaks on > big-endian systems. I'll take out the mas_dead_leaves() walk patch, but since I was only testing on x86, it won't have mattered. ... > I expect it is because your search skipped the badness inserted by the > bug from #6. I would think the workload that this was crashing on would > split the nodes in a certain way that bad VMAs ended up ahead of the > correct data? Maybe; I can't afford to speculate further on it. ... > So the only time I've even seen __vma_adjust() fail is with a fault > injector failing mas_preallocate() allocations. If it's safe to not > unwind, I'm happy to drop both unwinds but I was concerned in the path > of a vma_merge() calling __vma_adjust() and failing out on allocations > then OOM recovering, leaving a VMA with a 1/2 merged vma with anon > incorrectly set.. which is an even more unlikely scenario. It's not half-merged, it is correctly set up (just like if a write fault had occurred somewhere in that extent before the merge), so no need to unwind. ... > Right, the __split_vma() never adjusts anything but one side of the > 'vma' VMA by inserting the 'insert' VMA. This will result in two writes > to the tree - but one will exactly fit in an existing range which will > be placed without an allocation via the mas_wr_slot_store() function in > the maple tree. Exact fits are nice - they are fast. I'll have to come back and think about this again later on: "Exact fits are nice" may answer my concern in the end, but I still have the worry that the first store destroys the prealloc, when it might be the second store which needs the prealloc. ... > > > > Do you have the patch > > > > "maple_tree-Fix-stale-data-copy-in-mas_wr_node_store.patch"? It sounds > > > > like your issue fits this fix exactly. I was seeing the same issue with > > > > gcc 9.3.1 20200408 and this bug doesn't happen for me now. The logs > > > > you sent also fit the situation. I went through the same exercise > > > > (exorcism?) of debugging the various additions and removals of the VMA > > > > only to find the issue in the tree itself. The fix also modified the > > > > test code to detect the issue - which was actually hit but not detected > > > > in the existing test cases from a live capture of VMA activities. It is > > > > difficult to spot in the tree dump as well. I am sure I sent this to > > > > Andrew as it is included in v11 and did not show up in his diff, but I > > > > cannot find it on lore, perhaps I forgot to CC you? I've attached it > > > > here for you in case you missed it. > > > > > > Thanks! No, I never received that patch, nor can I see it on lore > > > or marc.info; but I (still) haven't looked at v11, and don't know > > > about Andrew's diff. Anyway, sounds exciting, I'm eager to stop > > > writing this mail and get to testing with that in - but please > > > let me know whether it's the mas_dead_leaves() or the __vma_adjust() > > > mods you attached previously, which you want me to leave out. The overnight test run ended in an unexpected way, but I believe we can count it as a success - a big success for your stale data copy fix. (If only that fix had got through the mail system on Friday, my report on Sunday would have been much more optimistic.) I said before that I expected the test run to hit the swapops.h migration entry !PageLocked BUG, but it did not. It ran for nearly 7 hours, and then one of its builds terminated with {standard input}: Assembler messages: {standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive gcc: fatal error: Killed signal terminated program cc1 compilation terminated. which I've never seen before. Usually I'd put something like that down to a error in swap, or a TLB flushing error (but I include Nadav's fix in my kernel, and wouldn't get very far without it): neither related to the maple tree patchset. But on this occasion, my guess is that it's actually an example of what the swapops.h migration entry !PageLocked BUG is trying to alert us to. Imagine when such a "stale" migration entry is found, but the page it points to (now reused for something else) just happens to be PageLocked at that instant. Then the BUG won't fire, and we proceed to use the page as if it's ours, but it's not. I think that's what happened. I must get on with the day: more testing, and thinking. Hugh