Re: oops when swapping on latest kernel git 3.15-rc5

Michal Hocko <mhocko@xxxxxxx> · Mon, 19 May 2014 10:23:31 +0200

On Sun 18-05-14 12:15:48, Hugh Dickins wrote:
> On Sun, 18 May 2014, Branimir Maksimovic wrote:
> 
> > Ia hev discovered this accidentaly when tried to see how oom killer
> > works. Program is this:
> > 
> > #include <unistd.h>
> > #include <cstring>
> > #include <exception>
> > #include <iostream>
> > 
> > int counter=0;
> > int main()
> > try
> > {
> >   for(;;++counter)
> >   {
> >     char* p = new char[1024*1024];
> >     memset(p,1,1024*1024);
> >     std::cout<<counter<<'\n';
> > //    if(counter > 24000)sleep(100);
> > 
> >   }
> > }catch(const std::exception& e)
> > {
> >   std::cout<<"exception:"<<e.what()<<" count:"<<counter<<std::endl;
> > }
> > 
> > After running this program system froze after some time. Programs could be
> > started but they will not finish.
> > Fortunatelly I could paste dmesg output:
> > 
> > [  388.522421] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000340
> > [  388.522427] IP: [<ffffffff81185b0b>]
> > get_mem_cgroup_from_mm.isra.42+0x2b/0x60
> 
> Thank you very much for reporting.  That BUG is a 3.15-rc regression.
> 3.14's try_get_mem_cgroup_from_mm() had protection against NULL mm,
> as when exiting.  That was correctly removed as unnecessary by one
> 3.15 commit, but a new caller added in a later commit: which made
> it necessary again, as you have now found.

Good timing. I had a similar report on Friday from our internal testing
and was waiting for the over weekend testing results. Will post the
patch in a minute.

> Easily fixable, but opinions will differ on the right way to write it
> (and I'm rather out of touch with the current flux in css_tryget and
> root_mem_cgroup), so Cc'ing Hannes and Michal for the definitive fix.

Yes, I went with get_mem_cgroup_from_mm way. But Johannes is on vacation
AFAIK. So I would rather go with this more conservative approach and
make some additional cleanup later if necessary.

> > [  388.522435] PGD 3f233c067 PUD 3f20f7067 PMD 0
> > [  388.522439] Oops: 0000 [#1] SMP
> > [  388.522441] Modules linked in: snd_hrtimer pci_stub vboxpci(OE)
> > vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse rfcomm bnep bluetooth
> > binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp crct10dif_pclmul
> > crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_hdmi aes_x86_64
> > lrw gf128mul glue_helper snd_hda_codec_realtek snd_hda_codec_generic
> > ablk_helper cryptd gspca_spca561 gspca_main videodev mxm_wmi snd_hda_intel
> > snd_hda_controller snd_hda_codec microcode snd_hwdep joydev snd_pcm
> > snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq dm_multipath scsi_dh
> > snd_seq_device snd_timer mei_me snd mei lpc_ich wmi soundcore video mac_hid
> > serio_raw parport_pc ppdev nct6775 hwmon_vid coretemp nvidia(POE) drm lp
> > parport btrfs xor raid6_pq hid_generic usbhid hid psmouse e1000e ahci libahci
> > ptp pps_core
> > [  388.522494] CPU: 1 PID: 160 Comm: kworker/u8:5 Tainted: P           OE
> > 3.15.0-rc5-core2-custom #159
> > [  388.522496] Hardware name: System manufacturer System Product Name/MAXIMUS
> > V GENE, BIOS 1903 08/19/2013
> > [  388.522498] task: ffff880404e349b0 ti: ffff88040486a000 task.ti:
> > ffff88040486a000
> > [  388.522500] RIP: 0010:[<ffffffff81185b0b>] [<ffffffff81185b0b>]
> > get_mem_cgroup_from_mm.isra.42+0x2b/0x60
> > [  388.522504] RSP: 0000:ffff88040486bab8  EFLAGS: 00010246
> > [  388.522506] RAX: 0000000000000000 RBX: ffffea000a416340 RCX:
> > 0000000000000a40
> > [  388.522508] RDX: ffff88041efe8a40 RSI: ffffea000a416340 RDI:
> > 0000000000000340
> > [  388.522509] RBP: ffff88040486bab8 R08: 000000000001cb56 R09:
> > 0000000000072d5a
> > [  388.522511] R10: 0000000000000000 R11: 0000000000000005 R12:
> > ffff88040486bb00
> > [  388.522512] R13: 00000000000000d0 R14: 0000000000000000 R15:
> > ffff8803f3fe82f8
> > [  388.522515] FS:  0000000000000000(0000) GS:ffff88041ec80000(0000)
> > knlGS:0000000000000000
> > [  388.522517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  388.522518] CR2: 0000000000000340 CR3: 00000003ee44d000 CR4:
> > 00000000001407e0
> > [  388.522520] Stack:
> > [  388.522521]  ffff88040486baf0 ffffffff8118abf5 ffffffff8112ce1a
> > 0000000000000000
> > [  388.522524]  ffffea000a416340 0000000000000003 00000000ffffffef
> > ffff88040486bb18
> > [  388.522527]  ffffffff8118b1cc ffff88040486baf8 000000000001cb56
> > 0000000000000000
> > [  388.522530] Call Trace:
> > [  388.522536]  [<ffffffff8118abf5>] __mem_cgroup_try_charge_swapin+0x45/0xf0
> > [  388.522539]  [<ffffffff8112ce1a>] ? __lock_page+0x6a/0x70
> > [  388.522543]  [<ffffffff8118b1cc>] mem_cgroup_charge_file+0x9c/0xe0
> > [  388.522548]  [<ffffffff8114599c>] shmem_getpage_gfp+0x62c/0x770
> > [  388.522552]  [<ffffffff81145b18>] shmem_write_begin+0x38/0x40
> > [  388.522555]  [<ffffffff8112d1c5>] generic_perform_write+0xc5/0x1c0
> > [  388.522559]  [<ffffffff811ad53a>] ? file_update_time+0x8a/0xd0
> > [  388.522563]  [<ffffffff8112f211>] __generic_file_aio_write+0x1d1/0x3f0
> > [  388.522567]  [<ffffffff81084fc1>] ? enqueue_entity+0x291/0xb90
> > [  388.522570]  [<ffffffff8112f47f>] generic_file_aio_write+0x4f/0xc0
> > [  388.522574]  [<ffffffff81192eaa>] do_sync_write+0x5a/0x90
> > [  388.522578]  [<ffffffff810c53c1>] do_acct_process+0x4b1/0x550
> > [  388.522582]  [<ffffffff810c5acd>] acct_process+0x6d/0xa0
> > [  388.522587]  [<ffffffff810667d0>] ? manage_workers.isra.25+0x2a0/0x2a0
> > [  388.522590]  [<ffffffff8104d937>] do_exit+0x827/0xa70
> > [  388.522594]  [<ffffffff8106699e>] ? worker_thread+0x1ce/0x3a0
> > [  388.522597]  [<ffffffff810667d0>] ? manage_workers.isra.25+0x2a0/0x2a0
> > [  388.522600]  [<ffffffff8106cad3>] kthread+0xc3/0xf0
> > [  388.522604]  [<ffffffff8106ca10>] ? kthread_create_on_node+0x180/0x180
> > [  388.522608]  [<ffffffff816bfe6c>] ret_from_fork+0x7c/0xb0
> > [  388.522611]  [<ffffffff8106ca10>] ? kthread_create_on_node+0x180/0x180

Hmm, this is slightly different from what I saw. The kernel thread is
common as well as swapcache mem_cgroup_charge_file path. We just got
there from a different path (shmem_file_splice_read). This looks like
accounting is done on tmpfs?

> Or does that backtrace say that it's a kernel thread that was exiting
> (and being accounted)?  A kernel thread would not have had an mm in the
> first place.
> 
> I know very little about accounting (acct_process etc).  You said above
> "Programs could be started but they will not finish": I'll assume that
> hitting such a BUG inside acct_process() led to that.

That sounds possible.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>