Re: [uml-devel] BUG: soft lockup for a user mode linux image

Toralf Förster <toralf.foerster@xxxxxx> · Sun, 06 Oct 2013 16:17:45 +0200

On 10/04/2013 07:52 AM, Richard Weinberger wrote:
> Am 03.10.2013 21:33, schrieb Toralf Förster:
>> On 10/03/2013 09:20 PM, Richard Weinberger wrote:
>>> Am 03.10.2013 21:16, schrieb Toralf Förster:
>>>> On 10/03/2013 09:04 PM, Richard Weinberger wrote:
>>>>> Am 03.10.2013 20:54, schrieb Toralf Förster:
>>>>>> On 10/02/2013 09:55 PM, Richard Weinberger wrote:
>>>>>>> On Wed, Oct 2, 2013 at 8:30 PM, Toralf Förster <toralf.foerster@xxxxxx> wrote:
>>>>>>>> Running trinity (1 process, no victim files, just "$>trinity -C1) for a longer time
>>>>>>>> within a 32 bit user mode linux image with a recent git kernel (host: 3.11.3 guest 3.12-rc3-g...)
>>>>>>>> yields into this konsole message :
>>>>>>>>
>>>>>>>>  * Starting local
>>>>>>>> net.core.warnings = 0                                                                         [ ok ]
>>>>>>>> BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child0:2031]
>>>>>>>>
>>>>>>>>
>>>>>>>> and at the host t1 of the "linux"-processes eats all CPU cycles at 1 CPU core.
>>>>>>>> 2 subsequent made back traces made with
>>>>>>>>
>>>>>>>> $> sudo gdb /home/tfoerste/devel/linux/linux 28144 -n -batch -ex bt
>>>>>>>>
>>>>>>>> shows nearly a similar position around __get_user_pages() - both are attached.
>>>>>>>>
>>>>>>>> I'm not surprised that trinity harms a systems - I'm just wondering whether this particular picture is
>>>>>>>> expected or if it points to an issue.
>>>>>>>>
>>>>>>>>
>>>>>>>> FWIW the last lines of trinity log were :
>>>>>>>>
>>>>>>>>
>>>>>>>> [2031] [94] setsid() = 2031
>>>>>>>> [2031] [95] setresgid(rgid=0xffff33e3, egid=0xffffff93, sgid=0x22000040) = -1 (Operation not permitted)
>>>>>>>> [2031] [96] vmsplice(fd=5, iov=0x85501e0, nr_segs=300, flags=9) = 0x3000
>>>>>>>> [2031] [97] setresuid(ruid=0x80549193, euid=0xc61041e0, suid=0xff19b6fa) = -1 (Operation not permitted)
>>>>>>>> [2031] [98] setpriority(which=0xff010000, who=0xf3737373, niceval=0x8088960c) = -1 (Invalid argument)
>>>>>>>> [2031] [99] socketcall(call=1, args=0x8550200) = -1 (Address family not supported by protocol)
>>>>>>>> [2031] [100] access(filename="�", mode=2017) = -1 (Invalid argument)
>>>>>>>> [2031] [101] getgroups(gidsetsize=0, grouplist=0x80d0000[page_rand]) = 3
>>>>>>>> [2031] [102] msync(start=0xc0100220, len=0, flags=3) = -1 (Invalid argument)
>>>>>>>> [2031] [103] sigpending(set=0x40025000) = 0
>>>>>>>> [2031] [104] signalfd4(ufd=383, user_mask=1, sizemask=0xa4200000, flags=0x80800) = -1 (Invalid argument)
>>>>>>>> [2031] [105] sendfile(out_fd=383, in_fd=382, offset=0, count=4096) = -1 (Invalid argument)
>>>>>>>> [2031] [106] fanotify_mark(fanotify_fd=382, flags=5, mask=0x8000023, dfd=382, pathname="/proc/1092/task/1092/fdinfo/68") = -1 (Invalid argument)
>>>>>>>> [2031] [107] wait4(upid=1, stat_addr=4, options=0xd761979b, ru=8) = -1 (Invalid argument)
>>>>>>>> [2031] [108] sigpending(set=0x80ca000[page_zeros]) = 0
>>>>>>>> [2031] [109] setresuid(ruid=0xefffd6fc, euid=0x1bf4c92f, suid=0xffff2e33) = -1 (Operation not permitted)
>>>>>>>> [2031] [110] munlock(addr=0x40025000, len=34) = 0
>>>>>>>> [2031] [111] timer_delete(timer_id=0xffffffdc) = -1 (Invalid argument)
>>>>>>>> [2031] [112] sched_get_priority_max(policy=0x10000040) = -1 (Invalid argument)
>>>>>>>> [2031] [113] syslog(type=0xc1000000, buf=1, len=0x82a5) = -1 (Operation not permitted)
>>>>>>>> [2031] [114] setpriority(which=0xc4c806c6, who=0xffffff01, niceval=0xffff0682) = -1 (Invalid argument)
>>>>>>>> [2031] [115] getgroups16(gidsetsize=0xfffe, grouplist=1) = -1 (Bad address)
>>>>>>>> [2031] [116] rename(oldname=4, newname=8) = -1 (Bad address)
>>>>>>>> [2031] [117] inotify_init() = 654
>>>>>>>> [2031] [118] getgid() = 100
>>>>>>>> [2031] [119] fstatat64(dfd=382, filename="/sys/devices/virtual/net/sit0/duplex", statbuf=0, flag=0xb545d727) = -1 (Invalid argument)
>>>>>>>> [2031] [120] unlinkat(dfd=382, pathname="/proc/sys/net/ipv4/neigh/default/retrans_time", flag=0xc00ef76) = -1 (Invalid argument)
>>>>>>>> [2031] [121] timerfd_create(clockid=0, flags=0) = 655
>>>>>>>> [2031] [122] munlock(addr=4, len=0x3fff) = -1 (Cannot allocate memory)
>>>>>>>> [2031] [123] fremovexattr(fd=382, name=0) = -1 (Bad address)
>>>>>>>> [2031] [124] sched_get_priority_min(policy=0xff58bfef) = -1 (Invalid argument)
>>>>>>>> [2031] [125] mq_timedreceive(mqdes=397, u_msg_ptr=4, msg_len=5245, u_msg_prio=0xc0100220, u_abs_timeout=0xc0100220) = -1 (Bad address)
>>>>>>>> [2031] [126] chdir(filename="/proc/116/net/ptype") = -1 (Not a directory)
>>>>>>>> [2031] [127] ssetmask(newmask=0x88000092) = 0
>>>>>>>> [2031] [128] statfs(pathname="/proc/6/mounts", buf=0) = -1 (Bad address)
>>>>>>>> [2031] [129] fchown16(fd=397, user=104, group=0x94100000) = -1 (Operation not permitted)
>>>>>>>> [2031] [130] fchdir(fd=397) = -1 (Not a directory)
>>>>>>>> [2031] [131] mkdir(pathname="/proc/1092/task/1092/fdinfo/316", mode=525) = -1 (File exists)
>>>>>>>> [2031] [132] fsetxattr(fd=386, name=0x856f158, value=0x8571160, size=0, flags=0) = -1 (Numerical result out of range)
>>>>>>>> [2031] [133] io_setup(nr_events=4095, ctxp=0x40266000) ^CKilled by signal 2.
>>>>>>>
>>>>>>> Reading your gdb backtraces show that schedule_timeout() got called
>>>>>>> with a negative value.
>>>>>>> Looks like an integer overflow.
>>>>>>> The soft-lockup might also origin from that (very big integer which
>>>>>>> did not overflow jet)
>>>>>>>
>>>>>>
>>>>>> If the culprit is solved by this patch I'd like to send it out. But I'm
>>>>>> unsure whether it catches the culprit or if it just covers the root cause.
>>>>>
>>>>> I fear your Patch will not fix the issue.
>>>>>
>>>>> Does the issue only trigger on 32bit UMLs?
>>>> No diea, I do only have a 32 bit system here (both host and client).
>>>>
>>>>> How long does it take till trinity hits it?
>>>> a command like
>>>>
>>>> $> ssh tfoerste@trinity "rm -rf t3; mkdir t3; cd t3; trinity -C4"
>>>>
>>>> usually needs 10 till 15 min to trigger the issue. With just 1 trinity
>>>> task (-C1) however it needs often a hour or more.
>>>
>>> That's good. :-)
>>> You can place some printk()s into balance_dirty_pages() and observe the values
>>> of period, max_pause, min_pause, etc...
>>> Maybe this will give us a clue.
>>
>> Will do it again. I did it before already but period was the only thing
>> which becames otherwise negative -
>> and because pages_dirtied was often in the range of 20 - 30 and perioad
>> around -3000
> 
> Did you print it before or after
>                 if (current->dirty_paused_when)
>                         pause -= now - current->dirty_paused_when;
> 
> I'm also wondering why
> 		if (pause < min_pause) {
> did not trigger or did it?
> 
>> I just wondered if HZ is casts as a short int in the calculation ?
>> BTW printk doesn't showed anything in the syslog (ok, I used
>> printk(KERN_ERR ...)
> 
> syslog or dmesg?
> 
> Thanks,
> //richard
> 

back from going a mushroom foray - I nearly surrounded all "pause=" statements
with BUG_ON and downgrades trinity (b/c their latest enhancements made trinity too weak)

The UML stopped here :
...
		if (unlikely(task_ratelimit == 0)) {
			period = max_pause;
			pause = max_pause;
			BUG_ON(pause < 0);
			goto pause;
		}
		BUG_ON(pages_dirtied < 0);
		BUG_ON(task_ratelimit < 0);
		period = HZ * pages_dirtied / task_ratelimit;
		BUG_ON(period < 0);         <----------------------here
		pause = period;
		BUG_ON(pause < 0);
		if (current->dirty_paused_when)
			pause -= now - current->dirty_paused_when;
		BUG_ON(pause < 0);
...

The back trace is :

tfoerste@n22 ~/devel/trinity $ gdb --core=/mnt/ramdisk/core /home/tfoerste/devel/linux/linux -batch -ex bt[New LWP 2522]
Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'.
Program terminated with signal 6, Aborted.
#0  0xb777a424 in __kernel_vsyscall ()
#0  0xb777a424 in __kernel_vsyscall ()
#1  0x083b33b5 in kill ()
#2  0x0807190d in uml_abort () at arch/um/os-Linux/util.c:93
#3  0x08071c45 in os_dump_core () at arch/um/os-Linux/util.c:148
#4  0x08061417 in panic_exit (self=0x85b9558 <panic_exit_notifier>, unused1=0, unused2=0x85ef720 <buf.16084>) at arch/um/kernel/um_arch.c:240
#5  0x0809a7d8 in notifier_call_chain (nl=0x0, val=0, v=0x85ef720 <buf.16084>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#6  0x0809a923 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:182
#7  atomic_notifier_call_chain (nh=0x85ef704 <panic_notifier_list>, val=0, v=0x85ef720 <buf.16084>) at kernel/notifier.c:191
#8  0x08410d1c in panic (fmt=0x0) at kernel/panic.c:130
#9  0x08411c64 in balance_dirty_pages (pages_dirtied=9, mapping=<optimized out>) at mm/page-writeback.c:1471
#10 0x080d1ce4 in balance_dirty_pages_ratelimited (mapping=0x6) at mm/page-writeback.c:1663
#11 0x080e2d0c in __do_fault (mm=0x4870da00, vma=0x487484c8, address=1095131136, pmd=0x0, pgoff=0, flags=1212323992, orig_pte=<incomplete type>) at mm/memory.c:3452
#12 0x080e5286 in do_nonlinear_fault (orig_pte=..., flags=<optimized out>, pmd=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>, page_table=<optimized out>) at mm/memory.c:3518
#13 handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3717
#14 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3845
#15 handle_mm_fault (mm=0x4870da00, vma=0x487484c8, address=1095131136, flags=1) at mm/memory.c:3868
#16 0x080e5a07 in __get_user_pages (tsk=0x47d5a400, mm=0x4870da00, start=1095131136, nr_pages=1025, gup_flags=519, pages=0x47c50000, vmas=0x0, nonblocking=0x0) at mm/memory.c:1822
#17 0x080e5cc3 in get_user_pages (tsk=0x0, mm=0x0, start=0, nr_pages=0, write=1, force=0, pages=0x47fafb98, vmas=0x6) at mm/memory.c:2019
#18 0x08140d0e in aio_setup_ring (ctx=<optimized out>) at fs/aio.c:340
#19 ioctx_alloc (nr_events=<optimized out>) at fs/aio.c:605
#20 SYSC_io_setup (ctxp=<optimized out>, nr_events=<optimized out>) at fs/aio.c:1122
#21 SyS_io_setup (nr_events=65535, ctxp=135045120) at fs/aio.c:1105
#22 0x080619c2 in handle_syscall (r=0x47d5a5d4) at arch/um/kernel/skas/syscall.c:35
#23 0x08073f2d in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Linux/skas/process.c:198
#24 userspace (regs=0x47d5a5d4) at arch/um/os-Linux/skas/process.c:431
#25 0x0805e6ac in fork_handler () at arch/um/kernel/process.c:160
#26 0x00000000 in ?? ()

I'll try again to see if printk's will help - how do I force a sync after a printk ?

-- 
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3
--
To unsubscribe from this list: send the line "unsubscribe trinity" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html