On 05/03/2024 09:54, Barry Song wrote: > On Tue, Mar 5, 2024 at 10:00 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> >> Hi Barry, >> >> On 18/02/2024 23:40, Barry Song wrote: >>> On Tue, Feb 6, 2024 at 1:14 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >>>> >>>> On 05/02/2024 09:51, Barry Song wrote: >>>>> +Chris, Suren and Chuanhua >>>>> >>>>> Hi Ryan, >> [...] >>>> >>> >>> Hi Ryan, >>> I am running into some races especially while enabling large folio swap-out and >>> swap-in both. some of them, i am still struggling with the detailed >>> timing how they >>> are happening. >>> but the below change can help remove those bugs which cause corrupted data. >> >> I'm getting quite confused with all the emails flying around on this topic. Here >> you were reporting a data corruption bug and your suggested fix below is the one >> you have now posted at [1]. But in the thread at [1] we concluded that it is not >> fixing a functional correctness issue, but is just an optimization in some >> corner cases. So does the corruption issue still manifest? Did you manage to >> root cause it? Is it a problem with my swap-out series or your swap-in series, >> or pre-existing? > > Hi Ryan, > > It is not a problem of your swap-out series, but a problem of my swap-in > series. The bug in swap-in series is triggered by the skipped PTEs in the > thread[1], but my swap-in code should still be able to cope with this situation > and survive it - a large folio might be partially but not completely unmapped > after try_to_unmap_one(). Ahh, understood, thanks! > I actually replied to you and explained all > the details here[2], but guess you missed it :-) I did read that mail, but the first line "They are the same" made me think this was solving a functional problem. And I still have a very shaky understanding of parts of the code that I haven't directly worked on, so sometimes some of the details go over my head - I'll get there eventually! > > [1] https://lore.kernel.org/linux-mm/20240304103757.235352-1-21cnbao@xxxxxxxxx/ > [2] https://lore.kernel.org/linux-mm/CAGsJ_4zdh5kOG7QP4UDaE-wmLFiTEJC2PX-_LxtOj=QrZSvkCA@xxxxxxxxxxxxxx/ > > apology this makes you confused. No need to apologise - I appreciate your taking the time to write it all down in detail. It helps me to learn these areas of the code. > >> >> [1] https://lore.kernel.org/linux-mm/20240304103757.235352-1-21cnbao@xxxxxxxxx/ >> >> Thanks, >> Ryan >> > > Thanks > Barry