Re: kernel crash from RBD in Ubuntu 12.04

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Awesome.  Thanks Alex.  I'll eagerly await 0.48 once it has finished QA.

 - Travis

On Tue, Jun 19, 2012 at 2:45 PM, Alex Elder <elder@xxxxxxxxxxxxx> wrote:
> On 06/19/2012 01:32 PM, Travis Rhoden wrote:
>> Hey folks,
>>
>> Ran into this today.  Not sure what I did wrong.  =)
>
> It appears you are running Linux 3.2.0.  This has symptoms that
> could be explained by a bug that has been fixed in newer Ceph
> code.  Specifically, I think this is the fix that, without it,
> you might see something like this:
>
>    rbd: don't drop the rbd_id too early
>
> https://github.com/ceph/ceph-client/commit/32eec68d2f233e8a6ae1cd326022f6862e2b9ce3
>
>
>                                        -Alex
>
>> I had an RBD successfully mounted and was done with it.  Proceeded to
>> do the following:
>>
>> root@spcnode2:~# ls /sys/bus/rbd/devices/
>> 0
>> root@spcnode2:~# echo 0 > /sys/bus/rbd/remove
>> root@spcnode2:~# ls /sys/bus/rbd/devices/      <--- At this point, I
>> believe the RBD has been successfully removed
>>
>> ----  About an hour passes where I am messing with my ceph cluster.
>> No other commands are run on this machine ----
>> ----  New cluster is up.  Time to mount my new RBD
>>
>> root@spcnode2:~# echo "10.55.30.0,10.55.30.1,10.55.30.2
>> name=admin,secret=AQCNv+BPoPQENBAAxlm39kJ5XteNxg2S/dulXw== rbd
>> perftest" | tee /sys/bus/rbd/add
>> 10.55.30.0,10.55.30.1,10.55.30.2
>> name=admin,secret=AQCNv+BPoPQENBAAxlm39kJ5XteNxg2S/dulXw== rbd
>> perftest
>> Segmentation fault
>>
>> Well that's ugly.  What's in syslog?
>>
>> Jun 19 11:16:56 spcnode2 kernel: [76564.387890] ------------[ cut here
>> ]------------
>> Jun 19 11:16:56 spcnode2 kernel: [76564.392569] WARNING: at
>> /build/buildd/linux-3.2.0/fs/sysfs/inode.c:324
>> sysfs_hash_and_remove+0xa9/0xb0()
>> Jun 19 11:16:56 spcnode2 kernel: [76564.402233] Hardware name: Relion 1702
>> Jun 19 11:16:56 spcnode2 kernel: [76564.406079] sysfs: can not remove
>> 'bdi', no directory
>> Jun 19 11:16:56 spcnode2 kernel: [76564.411268] Modules linked in: rbd
>> libceph ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
>> xt_state ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp xt_conntrack
>> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
>> ipmi_devintf ipmi_si iptable_filter ipmi_msghandler ip_tables x_tables
>> kvm_intel kvm bnep rfcomm bluetooth parport_pc ppdev nfsd nfs lockd
>> fscache auth_rpcgss nfs_acl sunrpc ext2 xfs vesafb ib_iser rdma_cm
>> ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
>> libiscsi scsi_transport_iscsi bridge mtdchar i7core_edac psmouse 8021q
>> garp stp lp parport dm_multipath mac_hid serio_raw edac_core ioatdma
>> usbhid hid sfc mtd i2c_algo_bit igb mdio dca btrfs zlib_deflate
>> libcrc32c
>> Jun 19 11:16:56 spcnode2 kernel: [76564.477972] Pid: 6924, comm: bash
>> Tainted: G      D W    3.2.0-25-generic #40-Ubuntu
>> Jun 19 11:16:56 spcnode2 kernel: [76564.485837] Call Trace:
>> Jun 19 11:16:56 spcnode2 kernel: [76564.488394]  [<ffffffff810672af>]
>> warn_slowpath_common+0x7f/0xc0
>> Jun 19 11:16:56 spcnode2 kernel: [76564.494511]  [<ffffffff810673a6>]
>> warn_slowpath_fmt+0x46/0x50
>> Jun 19 11:16:56 spcnode2 kernel: [76564.500348]  [<ffffffff81192958>]
>> ? iput_final+0xe8/0x210
>> Jun 19 11:16:56 spcnode2 kernel: [76564.505888]  [<ffffffff811ebc59>]
>> sysfs_hash_and_remove+0xa9/0xb0
>> Jun 19 11:16:56 spcnode2 kernel: [76564.512082]  [<ffffffff811ee356>]
>> sysfs_remove_link+0x26/0x30
>> Jun 19 11:16:56 spcnode2 kernel: [76564.517959]  [<ffffffff812fb960>]
>> del_gendisk+0x100/0x260
>> Jun 19 11:16:56 spcnode2 kernel: [76564.523448]  [<ffffffffa0623868>]
>> rbd_dev_release+0x108/0x110 [rbd]
>> Jun 19 11:16:56 spcnode2 kernel: [76564.529861]  [<ffffffff813f1407>]
>> device_release+0x27/0xa0
>> Jun 19 11:16:56 spcnode2 kernel: [76564.535432]  [<ffffffff8130cfdc>]
>> kobject_release+0x4c/0xa0
>> Jun 19 11:16:56 spcnode2 kernel: [76564.541163]  [<ffffffff8130cf90>]
>> ? kobject_del+0x40/0x40
>> Jun 19 11:16:56 spcnode2 kernel: [76564.546694]  [<ffffffff8130e686>]
>> kref_put+0x36/0x70
>> Jun 19 11:16:56 spcnode2 kernel: [76564.551764]  [<ffffffff8130ce97>]
>> kobject_put+0x27/0x60
>> Jun 19 11:16:56 spcnode2 kernel: [76564.557126]  [<ffffffff8131d33c>]
>> ? _kstrtoull+0x2c/0x90
>> Jun 19 11:16:56 spcnode2 kernel: [76564.562523]  [<ffffffff813f1167>]
>> put_device+0x17/0x20
>> Jun 19 11:16:56 spcnode2 kernel: [76564.567808]  [<ffffffff813f225e>]
>> device_unregister+0x1e/0x30
>> Jun 19 11:16:56 spcnode2 kernel: [76564.573647]  [<ffffffffa06211ea>]
>> rbd_remove+0x15a/0x160 [rbd]
>> Jun 19 11:16:56 spcnode2 kernel: [76564.579594]  [<ffffffff813f3c47>]
>> bus_attr_store+0x27/0x30
>> Jun 19 11:16:56 spcnode2 kernel: [76564.585113]  [<ffffffff811ebebf>]
>> sysfs_write_file+0xef/0x170
>> Jun 19 11:16:56 spcnode2 kernel: [76564.590907]  [<ffffffff81177f23>]
>> vfs_write+0xb3/0x180
>> Jun 19 11:16:56 spcnode2 kernel: [76564.596158]  [<ffffffff8117824a>]
>> sys_write+0x4a/0x90
>> Jun 19 11:16:56 spcnode2 kernel: [76564.601258]  [<ffffffff81665c42>]
>> system_call_fastpath+0x16/0x1b
>> Jun 19 11:16:56 spcnode2 kernel: [76564.607321] ---[ end trace
>> ace27f1cbf93eeaa ]---
>> Jun 19 11:16:57 spcnode2 kernel: [76564.612447] BUG: unable to handle
>> kernel NULL pointer dereference at 0000000000000079
>> Jun 19 11:16:57 spcnode2 kernel: [76564.620374] IP:
>> [<ffffffff811ed770>] sysfs_find_dirent+0x10/0x110
>> Jun 19 11:16:57 spcnode2 kernel: [76564.626475] PGD 404514067 PUD
>> 5f89cc067 PMD 0
>> Jun 19 11:16:57 spcnode2 kernel: [76564.630958] Oops: 0000 [#2] SMP
>> Jun 19 11:16:57 spcnode2 kernel: [76564.634254] CPU 5
>> Jun 19 11:16:57 spcnode2 kernel: [76564.636113] Modules linked in: rbd
>> libceph ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
>> xt_state ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp xt_conntrack
>> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
>> ipmi_devintf ipmi_si iptable_filter ipmi_msghandler ip_tables x_tables
>> kvm_intel kvm bnep rfcomm bluetooth parport_pc ppdev nfsd nfs lockd
>> fscache auth_rpcgss nfs_acl sunrpc ext2 xfs vesafb ib_iser rdma_cm
>> ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
>> libiscsi scsi_transport_iscsi bridge mtdchar i7core_edac psmouse 8021q
>> garp stp lp parport dm_multipath mac_hid serio_raw edac_core ioatdma
>> usbhid hid sfc mtd i2c_algo_bit igb mdio dca btrfs zlib_deflate
>> libcrc32c
>> Jun 19 11:16:57 spcnode2 kernel: [76564.701251]
>> Jun 19 11:16:57 spcnode2 kernel: [76564.702740] Pid: 6924, comm: bash
>> Tainted: G      D W    3.2.0-25-generic #40-Ubuntu Penguin Computing
>> Relion 1702/X8DTT
>> Jun 19 11:16:57 spcnode2 kernel: [76564.713752] RIP:
>> 0010:[<ffffffff811ed770>]  [<ffffffff811ed770>]
>> sysfs_find_dirent+0x10/0x110
>> Jun 19 11:16:57 spcnode2 kernel: [76564.722319] RSP:
>> 0018:ffff8805f8f9bc58  EFLAGS: 00010246
>> Jun 19 11:16:57 spcnode2 kernel: [76564.727719] RAX: ffff8806186edbc0
>> RBX: 0000000000000000 RCX: 00000000000988e6
>> Jun 19 11:16:57 spcnode2 kernel: [76564.734892] RDX: ffffffff81a0158d
>> RSI: 0000000000000000 RDI: 0000000000000000
>> Jun 19 11:16:57 spcnode2 kernel: [76564.742083] RBP: ffff8805f8f9bc78
>> R08: ffffea00303f6580 R09: ffffffff8130cfe9
>> Jun 19 11:16:57 spcnode2 kernel: [76564.749221] R10: ffff880c0fe5de28
>> R11: 0000000000000000 R12: 0000000000000000
>> Jun 19 11:16:57 spcnode2 kernel: [76564.756437] R13: ffffffff81a0158d
>> R14: ffff880bf45a5a50 R15: ffff880c0fd1de18
>> Jun 19 11:16:57 spcnode2 kernel: [76564.763630] FS:
>> 00007fe308eb7700(0000) GS:ffff880c3fc20000(0000)
>> knlGS:0000000000000000
>> Jun 19 11:16:57 spcnode2 kernel: [76564.771717] CS:  0010 DS: 0000 ES:
>> 0000 CR0: 0000000080050033
>> Jun 19 11:16:57 spcnode2 kernel: [76564.777549] CR2: 0000000000000079
>> CR3: 00000005f89cd000 CR4: 00000000000006e0
>> Jun 19 11:16:57 spcnode2 kernel: [76564.784738] DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> Jun 19 11:16:57 spcnode2 kernel: [76564.791877] DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Jun 19 11:16:57 spcnode2 kernel: [76564.798991] Process bash (pid:
>> 6924, threadinfo ffff8805f8f9a000, task ffff8806186edbc0)
>> Jun 19 11:16:57 spcnode2 kernel: [76564.807295] Stack:
>> Jun 19 11:16:57 spcnode2 kernel: [76564.809302]  0000000000000000
>> 0000000000000000 ffffffff81a0158d ffff880bf45a5a50
>> Jun 19 11:16:57 spcnode2 kernel: [76564.816832]  ffff8805f8f9bca8
>> ffffffff811ed9bc ffff8805f8f9bcd8 ffffffff81c34b00
>> Jun 19 11:16:57 spcnode2 kernel: [76564.824341]  ffff880605b36878
>> 0000000000000000 ffff8805f8f9bce8 ffffffff811efa15
>> Jun 19 11:16:57 spcnode2 kernel: [76564.831894] Call Trace:
>> Jun 19 11:16:57 spcnode2 kernel: [76564.834337]  [<ffffffff811ed9bc>]
>> sysfs_get_dirent+0x3c/0x80
>> Jun 19 11:16:57 spcnode2 kernel: [76564.840041]  [<ffffffff811efa15>]
>> sysfs_remove_group+0x35/0x100
>> Jun 19 11:16:57 spcnode2 kernel: [76564.846029]  [<ffffffff810fee24>]
>> blk_trace_remove_sysfs+0x14/0x20
>> Jun 19 11:16:57 spcnode2 kernel: [76564.852195]  [<ffffffff812f50d9>]
>> blk_unregister_queue+0x59/0x80
>> Jun 19 11:16:57 spcnode2 kernel: [76564.858270]  [<ffffffff812fb97b>]
>> del_gendisk+0x11b/0x260
>> Jun 19 11:16:57 spcnode2 kernel: [76564.863661]  [<ffffffffa0623868>]
>> rbd_dev_release+0x108/0x110 [rbd]
>> Jun 19 11:16:57 spcnode2 kernel: [76564.869962]  [<ffffffff813f1407>]
>> device_release+0x27/0xa0
>> Jun 19 11:16:57 spcnode2 kernel: [76564.875448]  [<ffffffff8130cfdc>]
>> kobject_release+0x4c/0xa0
>> Jun 19 11:16:57 spcnode2 kernel: [76564.881061]  [<ffffffff8130cf90>]
>> ? kobject_del+0x40/0x40
>> Jun 19 11:16:57 spcnode2 kernel: [76564.886502]  [<ffffffff8130e686>]
>> kref_put+0x36/0x70
>> Jun 19 11:16:57 spcnode2 kernel: [76564.891521]  [<ffffffff8130ce97>]
>> kobject_put+0x27/0x60
>> Jun 19 11:16:57 spcnode2 kernel: [76564.896739]  [<ffffffff8131d33c>]
>> ? _kstrtoull+0x2c/0x90
>> Jun 19 11:16:57 spcnode2 kernel: [76564.902043]  [<ffffffff813f1167>]
>> put_device+0x17/0x20
>> Jun 19 11:16:57 spcnode2 kernel: [76564.907226]  [<ffffffff813f225e>]
>> device_unregister+0x1e/0x30
>> Jun 19 11:16:57 spcnode2 kernel: [76564.913057]  [<ffffffffa06211ea>]
>> rbd_remove+0x15a/0x160 [rbd]
>> Jun 19 11:16:57 spcnode2 kernel: [76564.918881]  [<ffffffff813f3c47>]
>> bus_attr_store+0x27/0x30
>> Jun 19 11:16:57 spcnode2 kernel: [76564.924436]  [<ffffffff811ebebf>]
>> sysfs_write_file+0xef/0x170
>> Jun 19 11:16:57 spcnode2 kernel: [76564.930174]  [<ffffffff81177f23>]
>> vfs_write+0xb3/0x180
>> Jun 19 11:16:57 spcnode2 kernel: [76564.935450]  [<ffffffff8117824a>]
>> sys_write+0x4a/0x90
>> Jun 19 11:16:57 spcnode2 kernel: [76564.940497]  [<ffffffff81665c42>]
>> system_call_fastpath+0x16/0x1b
>> Jun 19 11:16:57 spcnode2 kernel: [76564.946488] Code: 41 5c 41 5d 41
>> 5e 41 5f 5d c3 90 4c 89 f7 e8 68 df 46 00 eb c3 0f 0b 0f 1f 40 00 55
>> 48 89 e5 41 56 41 55 41 54 53 66 66 66 66 90 <80> 7f 79 00 4c 8b 67 70
>> 49 89 d6 48 89 f3 0f 95 c0 48 85 f6 0f
>> Jun 19 11:16:57 spcnode2 kernel: [76564.966571] RIP
>> [<ffffffff811ed770>] sysfs_find_dirent+0x10/0x110
>> Jun 19 11:16:57 spcnode2 kernel: [76564.972826]  RSP <ffff8805f8f9bc58>
>> Jun 19 11:16:57 spcnode2 kernel: [76564.976331] CR2: 0000000000000079
>> Jun 19 11:16:57 spcnode2 kernel: [76564.979725] ---[ end trace
>> ace27f1cbf93eeab ]---
>>
>>
>> Had to do a hard reset on the machine afterwards.
>>
>> The machine mounting the RBD is running Ubuntu 12.04, and is not
>> hosting any OSDs or MONs.
>> root@spcnode2:~# uname -a
>> Linux spcnode2 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC
>> 2012 x86_64 x86_64 x86_64 GNU/Linux
>> root@spcnode2:~# ceph --version
>> ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
>>
>> - Travis
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux