Re: 'rbd map' asynchronous behavior

Andrey Korolyov <andrey@xxxxxxx> · Fri, 25 May 2012 14:15:13 +0400

Hi,

Newer kernel rbd driver throws a quite strange messages on map|unmap,
comparing to 3.2 branch:

rbd map 'path' # device appears as /dev/rbd1 instead of rbd0, then
rbd unmap /dev/rbd1 # causes following trace, w/ vanilla 3.4.0 from kernel.org:

[   99.700802] BUG: scheduling while atomic: rbd/3846/0x00000002
[   99.700857] Modules linked in: btrfs ip6table_filter ip6_tables
iptable_filter ip_tables ebtable_nat ebtables x_tables iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs nfs_acl
auth_rpcgss lockd sunrpc kvm_intel kvm bridge stp llc ipv6 rbd libceph
loop 8250_pnp pcspkr firewire_ohci coretemp firewire_core hwmon 8250
serial_core
[   99.700899] Pid: 3846, comm: rbd Not tainted 3.4.0 #3
[   99.700902] Call Trace:
[   99.700910]  [<ffffffff81464d68>] ? __schedule+0x96/0x625
[   99.700916]  [<ffffffff8105f98a>] ? __queue_work+0x254/0x27c
[   99.700921]  [<ffffffff81465d39>] ? _raw_spin_lock_irqsave+0x2a/0x32
[   99.700926]  [<ffffffff81069b6d>] ? complete+0x31/0x40
[   99.700931]  [<ffffffff8105f10a>] ? flush_workqueue_prep_cwqs+0x16e/0x180
[   99.700947]  [<ffffffff81463bd8>] ? schedule_timeout+0x21/0x1af
[   99.700951]  [<ffffffff8107165d>] ? enqueue_entity+0x67/0x13d
[   99.700955]  [<ffffffff81464ad9>] ? wait_for_common+0xc5/0x143
[   99.700959]  [<ffffffff8106d5fc>] ? try_to_wake_up+0x217/0x217
[   99.700963]  [<ffffffff81063952>] ? kthread_stop+0x30/0x50
[   99.700967]  [<ffffffff81060979>] ? destroy_workqueue+0x148/0x16b
[   99.700977]  [<ffffffffa004ce07>] ? ceph_osdc_stop+0x1f/0xaa [libceph]
[   99.700984]  [<ffffffffa00463b4>] ? ceph_destroy_client+0x10/0x44 [libceph]
[   99.700989]  [<ffffffffa00652ae>] ? rbd_client_release+0x38/0x4b [rbd]
[   99.700993]  [<ffffffffa0065719>] ? rbd_put_client.isra.10+0x28/0x3d [rbd]
[   99.700998]  [<ffffffffa006609d>] ? rbd_dev_release+0xc3/0x157 [rbd]
[   99.701003]  [<ffffffff81287387>] ? device_release+0x41/0x72
[   99.701007]  [<ffffffff81202b95>] ? kobject_release+0x4e/0x6a
[   99.701025]  [<ffffffffa0065156>] ? rbd_remove+0x102/0x11e [rbd]
[   99.701035]  [<ffffffff8114b058>] ? sysfs_write_file+0xd3/0x10f
[   99.701044]  [<ffffffff810f8796>] ? vfs_write+0xaa/0x136
[   99.701052]  [<ffffffff810f8a07>] ? sys_write+0x45/0x6e
[   99.701062]  [<ffffffff8146a839>] ? system_call_fastpath+0x16/0x1b
[   99.701170] BUG: scheduling while atomic: rbd/3846/0x00000002
[   99.701220] Modules linked in: btrfs ip6table_filter ip6_tables
iptable_filter ip_tables ebtable_nat ebtables x_tables iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs nfs_acl
auth_rpcgss lockd sunrpc kvm_intel kvm bridge stp llc ipv6 rbd libceph
loop 8250_pnp pcspkr firewire_ohci coretemp firewire_core hwmon 8250
serial_core
[   99.701251] Pid: 3846, comm: rbd Not tainted 3.4.0 #3
[   99.701253] Call Trace:
[   99.701257]  [<ffffffff81464d68>] ? __schedule+0x96/0x625
[   99.701261]  [<ffffffff81465ef9>] ? _raw_spin_unlock_irq+0x5/0x2e
[   99.701265]  [<ffffffff81069f92>] ? finish_task_switch+0x4c/0xc1
[   99.701268]  [<ffffffff8146525b>] ? __schedule+0x589/0x625
[   99.701272]  [<ffffffff812084b2>] ? ip4_string+0x5a/0xc8
[   99.701276]  [<ffffffff81208cbd>] ? string.isra.3+0x39/0x9f
[   99.701281]  [<ffffffff81208e33>] ? ip4_addr_string.isra.5+0x5a/0x76
[   99.701285]  [<ffffffff81208b7a>] ? number.isra.1+0x10e/0x218
[   99.701290]  [<ffffffff81463bd8>] ? schedule_timeout+0x21/0x1af
[   99.701294]  [<ffffffff81464ad9>] ? wait_for_common+0xc5/0x143
[   99.701298]  [<ffffffff8106d5fc>] ? try_to_wake_up+0x217/0x217
[   99.701303]  [<ffffffff8105f24c>] ? flush_workqueue+0x130/0x2a5
[   99.701309]  [<ffffffffa00463b9>] ? ceph_destroy_client+0x15/0x44 [libceph]
[   99.701314]  [<ffffffffa00652ae>] ? rbd_client_release+0x38/0x4b [rbd]
[   99.701319]  [<ffffffffa0065719>] ? rbd_put_client.isra.10+0x28/0x3d [rbd]
[   99.701324]  [<ffffffffa006609d>] ? rbd_dev_release+0xc3/0x157 [rbd]
[   99.701328]  [<ffffffff81287387>] ? device_release+0x41/0x72
[   99.701334]  [<ffffffff81202b95>] ? kobject_release+0x4e/0x6a
[   99.701343]  [<ffffffffa0065156>] ? rbd_remove+0x102/0x11e [rbd]
[   99.701352]  [<ffffffff8114b058>] ? sysfs_write_file+0xd3/0x10f
[   99.701361]  [<ffffffff810f8796>] ? vfs_write+0xaa/0x136
[   99.701369]  [<ffffffff810f8a07>] ? sys_write+0x45/0x6e
[   99.701377]  [<ffffffff8146a839>] ? system_call_fastpath+0x16/0x1b

On Wed, May 16, 2012 at 12:24 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>>This is most likely due to a recently-fixed problem.
>>The fix is found in this commit, although there were
>>other changes that led up to it:
>>   32eec68d2f   rbd: don't drop the rbd_id too early
>>It is present starting in Linux kernel 3.3; it appears
>>you are running 2.6?
>
> Nope, it`s just Debian kernel naming - they continue to name 3.x with
> 2.6 and I`m following them at own build naming. I have tried that on
> 3.2 first time, and just a couple of minutes ago on my notebook with
> 3.3.4 over relatively slow vpn connection - rbd failed with almost
> same backtrace(I have removed sleep from cycle and bug has been
> reproduce immediately after first map-unmap) and kernel has panicked
> after approx. four minutes when I stopped 'for...' execution,
> unfortunately no bt of panic because of X and lack of configured
> netconsole :) The symptoms are the same - 'rbd showmapped' shows
> latest volume, but unmap failed with 'xxx is not a block device remove
> failed: (22) Invalid argument' and a couple of 'null pointer
> dereference' messages in dmesg. I have used /dev/rbd0 instead of
> symlinks to reduce probability of involving udev-related timeout on
> symlinks creation.
>
> On Tue, May 15, 2012 at 7:40 PM, Josh Durgin <josh.durgin@xxxxxxxxxxx> wrote:
>> On 05/15/2012 04:49 AM, Andrey Korolyov wrote:
>>>
>>> Hi,
>>>
>>> There are strange bug when I tried to map excessive amounts of block
>>> devices inside the pool, like following
>>>
>>> for vol in $(rbd ls); do rbd map $vol; [some-microsleep]; [some
>>> operation or nothing, I have stubbed guestfs mount here] ;
>>> [some-microsleep];  unmap /dev/rbd/rbd/$vol ; [some-microsleep]; done,
>>>
>>> udev or rbd seems to be somehow late and mapping fails. There is no
>>> real-world harm at all, and such case can be easily avoided, but on
>>> busy cluster timeout increases and I was able to catch same thing on
>>> two-osd config in recovering state. For 0.1 second on healthy cluster,
>>> all works okay, for 0.05 it may fail with following trace(just for me,
>>> because I am testing on relatively old and crappy hardware, so others
>>> may catch that on smaller intervals):
>>
>>
>> udev is asynchronous by nature. The rbd tool itself doesn't wait for
>> /dev to be populated because you may not be using the default udev rule
>> (or not using udev at all). Our test framework polls for the device to
>> make sure 'rbd map' and udev completed:
>>
>> https://github.com/ceph/teuthology/blob/d6b9bd8b63c8c6c1181ece1f6941829d8d1d5152/teuthology/task/rbd.py#L190
>>
>> Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html