On Tue, Mar 16, 2021 at 5:46 PM Heiko Carstens <hca@xxxxxxxxxxxxx> wrote: > > Hi Shakeel, > > > > your commit 3a9ca1b0ac0f ("memcg: charge before adding to swapcache on > > > swapin") in linux-next 20210316 appears to cause user process faults / > > > crashes on s390 like: > > > > > > User process fault: interruption code 003b ilc:3 in sshd[2aa15280000+df000] > > > Failing address: 0000000000000000 TEID: 0000000000000800 > > > Fault in primary space mode while using user ASCE. > > > AS:00000000966b41c7 R3:0000000000000024 > > > CPU: 0 PID: 401 Comm: sshd Not tainted 5.12.0-rc3-00048-geba7667a8534 #10 > > > Hardware name: IBM 8561 T01 703 (z/VM 7.2.0) > > > User PSW : 0705000180000000 0000000000000000 > > > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:0 RI:0 EA:3 > > > User GPRS: 0000000000000000 fffffffffffff000 0000000000000001 000002aa157b88f0 > > > 000002aa157c43c0 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > User Code: Bad PSW. > > > > Thanks for the report. Can you please explain a bit what the above report tells? > > Ah, sorry. This is the s390 output for exception-traces. That is if > /proc/sys/debug/exception-trace is set to one, and a process gets > killed because of an unhandled signal. > > In this particular case sshd was killed because it tried to access > address zero, where nothing is mapped. > > Given that all higher registers are zero in the register dump above my > guess would be this happened because a stack page got unmapped, and > when it got accessed to restore register contents a zero page was > mapped in instead of the real old page contents. > > We have also all other sorts of crashes in our CI with linux-next > currently, e.g. LTP's testcase "swapping01" seems to be able to make > (more or less) sure that the init process get's killed (-> panic). I have tried the elfutils selftests and swapping01 on x86_64 VM and I am not able to reproduce the issue. Can you give a bit more detail of the setup along with the config file? I am assuming you are not creating cgroups as these tests do not manipulate cgroups. Also is the memory controller on your system on v1 or v2? I am fine with dropping the patch from mm-tree until we know more about this issue.