Re: Kernel's rbd in 3.10.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/24/2013 09:37 PM, Mikaël Cluseau wrote:
Hi,

I have a bug in the 3.10 kernel under debian, be it a self compiled
linux-stable from the git (built with make-kpkg) or the sid's package.

I'm using format-2 images (ceph version 0.61.6
(59ddece17e36fef69ecf40e239aeffad33c9db35)) to make snapshots and clones
of a database for development purposes. So I have a replay of the
database's logs on a ceph volume and I take a snapshots at fixed points
in time : mount -> recover database until a given time -> umount ->
snapshot -> go back to 1.

In both cases, it works for a while (mount/umount cycles) and after some
time it gives me the following error on mount :

Jul 25 15:20:46 **host** kernel: [14623.808604] ------------[ cut here
]------------
Jul 25 15:20:46 **host** kernel: [14623.808622] kernel BUG at
/build/linux-dT6LW0/linux-3.10.1/net/ceph/osd_client.c:2103!
Jul 25 15:20:46 **host** kernel: [14623.808641] invalid opcode: 0000
[#1] SMP
Jul 25 15:20:46 **host** kernel: [14623.808657] Modules linked in: cbc
rbd libceph nfsd auth_rpcgss oid_registry nfs_acl nfs lockd sunrpc
sha256_generic hmac nls_utf8 cifs dns_resolver fscache bridge stp llc
xfs loop coretemp kvm_intel kvm crc32c_intel psmouse serio_raw snd_pcm
snd_page_alloc snd_timer snd soundcore iTCO_wdt iTCO_vendor_support
i2c_i801 i7core_edac microcode pcspkr lpc_ich mfd_core joydev ioatdma
evdev edac_core acpi_cpufreq mperf button processor thermal_sys ext4
crc16 jbd2 mbcache btrfs xor zlib_deflate raid6_pq crc32c libcrc32c
raid1 ohci_hcd hid_generic usbhid hid sr_mod sg cdrom sd_mod crc_t10dif
dm_mod md_mod ata_generic ata_piix libata uhci_hcd ehci_pci ehci_hcd
scsi_mod usbcore usb_common igb i2c_algo_bit i2c_core dca ptp pps_core
Jul 25 15:20:46 **host** kernel: [14623.809005] CPU: 6 PID: 9583 Comm:
mount Not tainted 3.10-1-amd64 #1 Debian 3.10.1-1
Jul 25 15:20:46 **host** kernel: [14623.809024] Hardware name:
Supermicro X8DTU/X8DTU, BIOS 2.1b       12/30/2011
Jul 25 15:20:46 **host** kernel: [14623.809041] task: ffff88082dfa2840
ti: ffff88080e2c2000 task.ti: ffff88080e2c2000
Jul 25 15:20:46 **host** kernel: [14623.809059] RIP:
0010:[<ffffffffa05d08ff>]  [<ffffffffa05d08ff>]
ceph_osdc_build_request+0x370/0x3e9 [libceph]
Jul 25 15:20:46 **host** kernel: [14623.809092] RSP:
0018:ffff88080e2c39b8  EFLAGS: 00010216
Jul 25 15:20:46 **host** kernel: [14623.809120] RAX: ffff88082e589a80
RBX: ffff88082e589b72 RCX: 0000000000000007
Jul 25 15:20:46 **host** kernel: [14623.809151] RDX: ffff88082e589b6f
RSI: ffff88082afd9078 RDI: ffff88082b308258
Jul 25 15:20:46 **host** kernel: [14623.809182] RBP: 0000000000001000
R08: ffff88082e10a400 R09: ffff88082afd9000
Jul 25 15:20:46 **host** kernel: [14623.809213] R10: ffff8806bfb1cd60
R11: ffff88082d153c01 R12: ffff88080e88e988
Jul 25 15:20:46 **host** kernel: [14623.809243] R13: 0000000000000001
R14: ffff88080eb874d8 R15: ffff88080eb875b8
Jul 25 15:20:46 **host** kernel: [14623.809275] FS:
00007f2c893b77e0(0000) GS:ffff88083fc40000(0000) knlGS:0000000000000000
Jul 25 15:20:46 **host** kernel: [14623.809322] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Jul 25 15:20:46 **host** kernel: [14623.809350] CR2: ffffffffff600400
CR3: 00000006bfbc6000 CR4: 00000000000007e0
Jul 25 15:20:46 **host** kernel: [14623.809381] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 15:20:46 **host** kernel: [14623.809413] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 25 15:20:46 **host** kernel: [14623.809442] Stack:
Jul 25 15:20:46 **host** kernel: [14623.814598]  0000000000002201
ffff88080e2c3a30 0000000000001000 ffff88042edf2240
Jul 25 15:20:46 **host** kernel: [14623.814656]  00000024a05cbb01
0000000000000000 ffff88082e84f348 ffff88080e2c3a58
Jul 25 15:20:46 **host** kernel: [14623.814710]  ffff88080eb874d8
ffff88080e9aa290 ffff88027abc6000 0000000000001000
Jul 25 15:20:46 **host** kernel: [14623.814765] Call Trace:
Jul 25 15:20:46 **host** kernel: [14623.814793]  [<ffffffffa05bb7f3>] ?
rbd_osd_req_format_write+0x81/0x8c [rbd]
Jul 25 15:20:46 **host** kernel: [14623.814827]  [<ffffffffa05bea1c>] ?
rbd_img_request_fill+0x679/0x74f [rbd]
Jul 25 15:20:46 **host** kernel: [14623.814865]  [<ffffffff8105f670>] ?
should_resched+0x5/0x23
Jul 25 15:20:46 **host** kernel: [14623.814896]  [<ffffffffa05bf3d1>] ?
rbd_request_fn+0x180/0x226 [rbd]
Jul 25 15:20:46 **host** kernel: [14623.814929]  [<ffffffff811a819c>] ?
__blk_run_queue_uncond+0x1e/0x26
Jul 25 15:20:46 **host** kernel: [14623.814960]  [<ffffffff811a905f>] ?
blk_queue_bio+0x299/0x2e8
Jul 25 15:20:46 **host** kernel: [14623.814990]  [<ffffffff811a7523>] ?
generic_make_request+0x96/0xd5
Jul 25 15:20:46 **host** kernel: [14623.815021]  [<ffffffff811a810f>] ?
submit_bio+0x10a/0x13b
Jul 25 15:20:46 **host** kernel: [14623.815053]  [<ffffffff8112fe3d>] ?
bio_alloc_bioset+0xd0/0x172
Jul 25 15:20:46 **host** kernel: [14623.815083]  [<ffffffff8112d36a>] ?
_submit_bh+0x1b7/0x1d4
Jul 25 15:20:46 **host** kernel: [14623.815117]  [<ffffffff8112d4e9>] ?
__sync_dirty_buffer+0x4e/0x7b
Jul 25 15:20:46 **host** kernel: [14623.815164]  [<ffffffffa03053b6>] ?
ext4_commit_super+0x192/0x1db [ext4]
Jul 25 15:20:46 **host** kernel: [14623.815206]  [<ffffffffa0306cfe>] ?
ext4_setup_super+0xff/0x146 [ext4]
Jul 25 15:20:46 **host** kernel: [14623.815248]  [<ffffffffa03094e2>] ?
ext4_fill_super+0x1c55/0x2500 [ext4]
Jul 25 15:20:46 **host** kernel: [14623.815282]  [<ffffffff811c7194>] ?
string.isra.3+0x36/0x99
Jul 25 15:20:46 **host** kernel: [14623.815322]  [<ffffffffa030788d>] ?
ext4_calculate_overhead+0x2a5/0x2a5 [ext4]
Jul 25 15:20:46 **host** kernel: [14623.815371]  [<ffffffff8110b721>] ?
sget+0x460/0x478
Jul 25 15:20:46 **host** kernel: [14623.815410]  [<ffffffffa030788d>] ?
ext4_calculate_overhead+0x2a5/0x2a5 [ext4]
Jul 25 15:20:46 **host** kernel: [14623.815457]  [<ffffffff8110b8ed>] ?
mount_bdev+0x143/0x1a5
Jul 25 15:20:46 **host** kernel: [14623.815490]  [<ffffffff810f9857>] ?
__kmalloc_track_caller+0xd5/0xe5
Jul 25 15:20:46 **host** kernel: [14623.815522]  [<ffffffff8110c08d>] ?
mount_fs+0x5f/0x140
Jul 25 15:20:46 **host** kernel: [14623.815554]  [<ffffffff8111e70f>] ?
vfs_kern_mount+0x60/0xe1
Jul 25 15:20:46 **host** kernel: [14623.815585]  [<ffffffff8112078b>] ?
do_mount+0x678/0x7f2
Jul 25 15:20:46 **host** kernel: [14623.815615]  [<ffffffff810d47be>] ?
memdup_user+0x36/0x5b
Jul 25 15:20:46 **host** kernel: [14623.815645]  [<ffffffff81120983>] ?
SyS_mount+0x7e/0xb7
Jul 25 15:20:46 **host** kernel: [14623.815676]  [<ffffffff813938e9>] ?
system_call_fastpath+0x16/0x1b
Jul 25 15:20:46 **host** kernel: [14623.815705] Code: 00 00 00 8b 54 24
28 66 89 50 22 49 8b 86 c0 00 00 00 8b 54 24 10 89 50 1e 49 8b 44 24 48
48 89 c2 49 03 54 24 50 48 39 d3 76 02 <0f> 0b 48 29 c3 49 89 5c 24 50
41 89 5c 24 16 eb 59 66 81 fd 01
Jul 25 15:20:46 **host** kernel: [14623.815934] RIP [<ffffffffa05d08ff>]
ceph_osdc_build_request+0x370/0x3e9 [libceph]
Jul 25 15:20:46 **host** kernel: [14623.815987]  RSP <ffff88080e2c39b8>
Jul 25 15:20:46 **host** kernel: [14623.816398] ---[ end trace
556a473d0b86002e ]---

It seems that if I rollback to the previous snapshot I can mount the
image again, but I have to reboot the machine every time :'(

Thanks for the report, I don't think we've seen that before.
I filed http://tracker.ceph.com/issues/5760 to track it.

Josh
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux