On Friday, October 21, 2016, Markus Blank-Burian <burian@xxxxxxxxxxx> wrote:
Hi,
is there any update regarding this bug?
I did send a patch and i believe it should find its way in upstream releaes rather soon.
I can easily reproduce this issue on our cluster with the following
scenario:
- Start a few hundred processes on different nodes, each process writing
slowly some text into its own output file
- Call: watch -n1 'grep mycustomerrorstring *.out'
- Hit CTRL+C (crashes the machine not always, but on a regular basis)
We are using a 4.4.25 kernel with some additional ceph patches borrowed from
newer kernel releases.
Thanks,
Markus
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Nikolay Borisov
Sent: Montag, 10. Oktober 2016 12:36
To: Ilya Dryomov <idryomov@xxxxxxxxx>
Cc: Yan, Zheng <zyan@xxxxxxxxxx>; ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: Crash in ceph_read_iter->__free_pages due to null
page
On 10/10/2016 12:22 PM, Ilya Dryomov wrote:
> On Fri, Oct 7, 2016 at 1:40 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote:
>> Hello,
>>
>> I've encountered yet another cephfs crash:
>>
>> [990188.822271] BUG: unable to handle kernel NULL pointer dereference
>> at 000000000000001c [990188.822790] IP: [<ffffffff81130515>]
>> __free_pages+0x5/0x30 [990188.823090] PGD 180dd8f067 PUD 1bf2722067
>> PMD 0 [990188.823506] Oops: 0002 [#1] SMP
>> [990188.831274] CPU: 25 PID: 18418 Comm: php-fpm Tainted: G O
4.4.20-clouder2 #6
>> [990188.831650] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0
>> 12/28/2015 [990188.831876] task: ffff8822a3b7b700 ti:
>> ffff88022427c000 task.ti: ffff88022427c000 [990188.832249] RIP:
>> 0010:[<ffffffff81130515>] [<ffffffff81130515>] __free_pages+0x5/0x30
>> [990188.832691] RSP: 0000:ffff88022427fda8 EFLAGS: 00010246
>> [990188.832914] RAX: 00000000fffffe00 RBX: 0000000000000f3d RCX:
>> 00000000c0000100 [990188.833292] RDX: 00000000000047f2 RSI:
>> 0000000000000000 RDI: 0000000000000000 [990188.833670] RBP:
>> ffff88022427fe50 R08: ffff88022427c000 R09: 00038459d3aa3ee4
>> [990188.834049] R10: 000000013b00e4b8 R11: 0000000000000000 R12:
>> 0000000000000000 [990188.834429] R13: ffff8802c5189f88 R14:
>> ffff881091270ca8 R15: ffff88022427fe70 [990188.838820] FS:
>> 00007fc8ff5cb7c0(0000) GS:ffff881fffba0000(0000) knlGS:0000000000000000
[990188.839197] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[990188.839420] CR2: 000000000000001c CR3: 0000000405f7e000 CR4:
00000000001406e0 [990188.839797] Stack:
>> [990188.840013] ffffffffa044a1bc ffff880600000000 0000000000000000
>> ffff88022427fe70 [990188.840639] ffff8802c5189f88 ffff88189297b6a0
>> ffffffff00000f3d ffff8810fffffe00 [990188.841263] ffff88022427fe98
>> 00000000ffffffff 0000000000002000 ffff8802c5189c20 [990188.841886] Call
Trace:
>> [990188.842115] [<ffffffffa044a1bc>] ? ceph_read_iter+0x19c/0x5f0
>> [ceph] [990188.842345] [<ffffffff81198c27>] __vfs_read+0xa7/0xd0
>> [990188.842568] [<ffffffff81199216>] vfs_read+0x86/0x130
>> [990188.842792] [<ffffffff81199fb6>] SyS_read+0x46/0xa0
>> [990188.843018] [<ffffffff81614f5b>]
>> entry_SYSCALL_64_fastpath+0x16/0x6e
>> [990188.843243] Code: e2 48 89 de ff d1 49 8b 0f 48 85 c9 75 e8 65 ff
>> 0d 99 a7 ed 7e eb 85 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f
>> 1f 44 00 00 <f0> ff 4f 1c 74 01 c3 55 85 f6 48 89 e5 74 07 e8 f7 f5
>> ff ff 5d [990188.847887] RIP [<ffffffff81130515>]
>> __free_pages+0x5/0x30 [990188.848183] RSP <ffff88022427fda8>
>> [990188.848404] CR2: 000000000000001c
>>
>> The problem is that page(%RDI) being passed to __free_pages is NULL.
>> Also retry_op is CHECK_EOF(1), so the page allocation didn't execute
>> which leads to the null page. statret is : fffffe00 which seems to be
-ERESTARTSYS.
>
> Looks like this one exists upsteam - -ERESTARTSYS is returned from
> __ceph_do_getattr() if the process is killed while waiting for the
> reply from the MDS. At first sight it's just a busted error path, but
> it could use more testing. Zheng?
Checking the thread_info struct of the task in question it does have
TIF_SIGPENDING set and indeed the crash's "sig" command (if I'm reading
correctly the output" indicates that signal 15 (SIGTERM) is pending:
SHARED_PENDING
SIGNAL: 0000000000004000
SIGQUEUE: SIG SIGINFO
15 ffff8801439a5d78
>
> Thanks,
>
> Ilya
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com