On Wed, Mar 17, 2021 at 05:23:04PM -0700, Shakeel Butt wrote: > CC: Minchan > > On Wed, Mar 17, 2021 at 2:39 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > On Wed, Mar 17, 2021 at 2:11 PM Heiko Carstens <hca@xxxxxxxxxxxxx> wrote: > > > > > > On Wed, Mar 17, 2021 at 09:44:21PM +0100, Heiko Carstens wrote: > > > > On Wed, Mar 17, 2021 at 08:44:14AM -0700, Shakeel Butt wrote: > > > > > > Config below. And the fun thing is that I cannot reproduce it today > > > > > > anymore with the elfutils test case - what _seems_ to be different is > > > > > > that the test suite runs much faster than yesterday evening. Usually > > > > > > an indication that there is no steal time (other guests which steal > > > > > > cpu time), which again _could_ indicate a race / lack of locking > > > > > > somewhere. > > > > > > This is kind of odd, since yesterday evening it was very reliable to > > > > > > trigger the bug :/ > > > > > > > > > > > > > > > > Thanks for the config. One question regarding swap, is it disk based > > > > > swap or zram? > > > > > > > > Swap device is a real disk. > > > > > > > > > By guests, do you mean there was another significant workload running > > > > > on the machine in parallel to the tests? > > > > > > > > That I don't know. I didn't check. I still can't reproduce with > > > > elfutils anymore, however... > > > > > > > > > If you don't mind can you try swapping01 as well. > > > > > > > > ltp's swapping01 test triggers immediately random processes being > > > > killed with SIGSEGV. I also tested with linux-next 20210316 and _only_ > > > > "memcg: charge before adding to swapcache on swapin" being reverted on > > > > top, and the problem is away - so it looks like the result of > > > > yesterday's bisect is indeed valid. > > > > > > I have to correct myself, actually the system has both: a real disk > > > _and_ zram as swap devices: > > > > > > # swapon -s > > > Filename Type Size Used Priority > > > /dev/dasdb1 partition 21635084 0 -2 > > > /dev/zram0 partition 1014780 0 100 > > > > > > When I disable /dev/zram with "swapoff /dev/zram0" the problem is away > > > as well, even with your patch applied. > > > > Thanks a lot. This was really helpful. I will try with zram on my setup. > > > > Can you also try with just one type of swap at the time for both? I > > really appreciate your help. > > Never mind I think I found the issue. Can you please add > set_page_private(page, entry.val) before swap_readpage(page, true) in > function do_swap_page() in mm/memory.c and try the swapping01 test > again? > > Michan, for SWP_SYNCHRONOUS_IO swap, do we ever reset page->private? > Normally for swapcache pages, it gets reset on delete from swap cache > but these types of swap skips swapcache, so, I think we never reset > page->private. Yub, you are correct. > > The simplest solution I can think of is to do set_page_private(page, > entry.val) before swap_readpage(page, true) and set_page_private(page, > 0) after. Since I did't read the bug in detail, I couldn't come up with how the missing reset is connected the problem while missing set_page_private with entry.val is clear. Anyway, your point is correct and I cannot think better way. Thanks.