Re: crash in rbd_img_request_create

Alex Elder <elder@xxxxxxxx> · Sat, 10 May 2014 22:11:24 -0500

On 05/10/2014 05:18 PM, Hannes Landeholm wrote:
> Hello,
> 
> I have a development machine that I have been running stress tests on
> for a week as I'm trying to reproduce some hard to reproduce failures.
> I've mentioned the same machine previously in the thread "rbd unmap
> deadlock". I just now noticed that some processes had completely
> stalled. I looked in the system log and saw this crash about 9 hours
> ago:

Are you still running kernel rbd as a client of ceph
services running on the same physical machine?

I personally believe that scenario may be at risk of
deadlock in any case--we haven't taken great care to
avoid it in this case.

Anyway...

I can build v3.14.1 but I don't know what kernel configuration
you are using.  Knowing that could be helpful.  I built it using
a config I have though, and it's *possible* you crashed on
this line, in rbd_segment_name():
        ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
                        rbd_dev->header.object_prefix, segment);
And if so, the only reason I can think that this failed is if
rbd_dev->header.object_prefix were null (or an otherwise bad
pointer value).  But at this point it's a lot of speculation.

Depending on what your stress tests were doing, I suppose it
could be that you unmapped an in-use rbd image and there was
some sort of insufficient locking.

Can you also give a little insight about what your stress
tests were doing?

Thanks.

					-Alex

> kernel: BUG: unable to handle kernel paging request at ffff87ff3fbcdc58
> kernel: IP: [<ffffffffa0357203>] rbd_img_request_fill+0x123/0x6d0 [rbd]
> kernel: PGD 0
> kernel: Oops: 0000 [#1] PREEMPT SMP
> kernel: Modules linked in: xt_recent xt_conntrack ipt_REJECT xt_limit
> xt_tcpudp iptable_filter veth ipt_MASQUERADE iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> ip_tables x_tables cbc bridge stp llc coretemp x86_pkg_temp_thermal
> intel_powerclamp kvm_intel kvm cr
> kernel:  crc32c libcrc32c ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom
> crc_t10dif crct10dif_common atkbd libps2 ahci libahci libata ehci_pci
> xhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio
> kernel: CPU: 4 PID: 3015 Comm: mysqld Tainted: P           O 3.14.1-1-js #1
> kernel: Hardware name: ASUSTeK COMPUTER INC. RS100-E8-PI2/P9D-M
> Series, BIOS 0302 05/10/2013
> kernel: task: ffff88003f046220 ti: ffff88011d3d2000 task.ti: ffff88011d3d2000
> kernel: RIP: 0010:[<ffffffffa0357203>]  [<ffffffffa0357203>]
> rbd_img_request_fill+0x123/0x6d0 [rbd]
> kernel: RSP: 0018:ffff88011d3d3ac0  EFLAGS: 00010286
> kernel: RAX: ffff87ff3fbcdc00 RBX: 0000000008814000 RCX: 00000000011bcf84
> kernel: RDX: ffffffffa035c867 RSI: 0000000000000065 RDI: ffff8800b338f000
> kernel: RBP: ffff88011d3d3b78 R08: 000000000001abe0 R09: ffffffffa03571e0
> kernel: R10: 772d736a2f73656e R11: 6e61682d637a762f R12: ffff8800b338f000
> kernel: R13: ffff88025609d100 R14: 0000000000000000 R15: 0000000000000001
> kernel: FS:  00007fffe17fb700(0000) GS:ffff88042fd00000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: ffff87ff3fbcdc58 CR3: 0000000126e0e000 CR4: 00000000001407e0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> kernel: Stack:
> kernel:  ffff880128ad0d98 0000000000000000 000022011d3d3bb8 ffff87ff3fbcdc20
> kernel:  ffff87ff3fbcdcc8 ffff8803b6459c90 682d637a762fea80 0000000000000001
> kernel:  0000000000000000 ffff87ff3fbcdc00 ffff8803b6459c30 0000000000004000
> kernel: Call Trace:
> kernel:  [<ffffffffa03554d5>] ? rbd_img_request_create+0x155/0x220 [rbd]
> kernel:  [<ffffffff8125cab9>] ? blk_add_timer+0x19/0x20
> kernel:  [<ffffffffa035aa1d>] rbd_request_fn+0x1ed/0x330 [rbd]
> kernel:  [<ffffffff81252f13>] __blk_run_queue+0x33/0x40
> kernel:  [<ffffffff8127a4dd>] cfq_insert_request+0x34d/0x560
> kernel:  [<ffffffff8124fa1c>] __elv_add_request+0x1bc/0x300
> kernel:  [<ffffffff81256cd0>] blk_flush_plug_list+0x1d0/0x230
> kernel:  [<ffffffff812570a4>] blk_finish_plug+0x14/0x40
> kernel:  [<ffffffffa027fd6e>] ext4_writepages+0x48e/0xd50 [ext4]
> kernel:  [<ffffffff811417ae>] do_writepages+0x1e/0x40
> kernel:  [<ffffffff811363d9>] __filemap_fdatawrite_range+0x59/0x60
> kernel:  [<ffffffff811364da>] filemap_write_and_wait_range+0x2a/0x70
> kernel:  [<ffffffffa027749a>] ext4_sync_file+0xba/0x360 [ext4]
> kernel:  [<ffffffff811d50ce>] do_fsync+0x4e/0x80
> kernel:  [<ffffffff811d5350>] SyS_fsync+0x10/0x20
> kernel:  [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
> kernel: Code: 00 00 00 e8 a0 25 e3 e0 48 85 c0 49 89 c4 0f 84 0c 04 00
> 00 48 8b 45 90 48 8b 5d b0 48 c7 c2 67 c8 35 a0 be 65 00 00 00 4c 89
> e7 <0f> b6 48 58 48 d3 eb 83 78 18 02 48 89 c1 48 8b 49 50 48 c7 c0
> kernel: RIP  [<ffffffffa0357203>] rbd_img_request_fill+0x123/0x6d0 [rbd]
> kernel:  RSP <ffff88011d3d3ac0>
> kernel: CR2: ffff87ff3fbcdc58
> kernel: ---[ end trace bebc1d7ea3182129 ]---
> 
> uname: Linux localhost 3.14.1-1-js #1 SMP PREEMPT Tue Apr 15 17:59:05
> CEST 2014 x86_64 GNU/Linux
> 
> This is a "stock" Arch 3.14.1 kernel with no custom patches.
> 
> For some reason the rest of the system still works fine but trying to
> clean up with SIGKILL makes the system full of unkillable deferred
> zombie processes.
> 
> Ceph cluster looks fine, I ran a successful deep scrub as well. It
> still uses the same machine but it runs a new cluster now:
> 
>     cluster 32c6af82-73ff-4ea8-9220-cd47c6976ecb
>      health HEALTH_WARN
>      monmap e1: 1 mons at {margarina=192.168.0.215:6789/0}, election
> epoch 1, quorum 0 margarina
>      osdmap e54: 2 osds: 2 up, 2 in
>       pgmap v62043: 492 pgs, 6 pools, 4240 MB data, 1182 objects
>             18810 MB used, 7083 GB / 7101 GB avail
>                  492 active+clean
> 
> 2014-05-11 00:03:00.551688 mon.0 [INF] pgmap v62043: 492 pgs: 492
> active+clean; 4240 MB data, 18810 MB used, 7083 GB / 7101 GB avail
> 
> Trying to unmap the related rbd volume goes horribly wrong. "rbd
> unmap" waits for a child process (wait4) with an empty cmdline that
> has deadlocked with the following stack:
> 
> [<ffffffff811e83b3>] fsnotify_clear_marks_by_group_flags+0x33/0xb0
> [<ffffffff811e8443>] fsnotify_clear_marks_by_group+0x13/0x20
> [<ffffffff811e75c2>] fsnotify_destroy_group+0x12/0x50
> [<ffffffff811e96a2>] inotify_release+0x22/0x50
> [<ffffffff811a811c>] __fput+0x9c/0x220
> [<ffffffff811a82ee>] ____fput+0xe/0x10
> [<ffffffff810848ec>] task_work_run+0xbc/0xe0
> [<ffffffff81067556>] do_exit+0x2a6/0xa70
> [<ffffffff814df85b>] oops_end+0x9b/0xe0
> [<ffffffff814d5f8a>] no_context+0x296/0x2a3
> [<ffffffff814d601d>] __bad_area_nosemaphore+0x86/0x1dc
> [<ffffffff814d6186>] bad_area_nosemaphore+0x13/0x15
> [<ffffffff814e1e4e>] __do_page_fault+0x3ce/0x5a0
> [<ffffffff814e2042>] do_page_fault+0x22/0x30
> [<ffffffff814ded38>] page_fault+0x28/0x30
> [<ffffffff811ea249>] SyS_inotify_add_watch+0x219/0x360
> [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> As before rbd likely still doesn't contain any debug symbols as we
> haven't recompiled anything yet. I should really get that done. I
> could double check though if that would really, really help you.
> 
> I will probably hard reboot this machine soon so I can continue my
> stress tests so if you want me to pull out some other data from the
> run time state you should reply immediately.
> 
> Thank you for your time,
> --
> Hannes Landeholm
> Co-founder & CTO
> Jumpstarter - www.jumpstarter.io
> 
> ☎ +46 72 301 35 62
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html