On 07/24/2013 09:37 PM, Mikaël Cluseau wrote:
Hi, I have a bug in the 3.10 kernel under debian, be it a self compiled linux-stable from the git (built with make-kpkg) or the sid's package. I'm using format-2 images (ceph version 0.61.6 (59ddece17e36fef69ecf40e239aeffad33c9db35)) to make snapshots and clones of a database for development purposes. So I have a replay of the database's logs on a ceph volume and I take a snapshots at fixed points in time : mount -> recover database until a given time -> umount -> snapshot -> go back to 1. In both cases, it works for a while (mount/umount cycles) and after some time it gives me the following error on mount : Jul 25 15:20:46 **host** kernel: [14623.808604] ------------[ cut here ]------------ Jul 25 15:20:46 **host** kernel: [14623.808622] kernel BUG at /build/linux-dT6LW0/linux-3.10.1/net/ceph/osd_client.c:2103! Jul 25 15:20:46 **host** kernel: [14623.808641] invalid opcode: 0000 [#1] SMP Jul 25 15:20:46 **host** kernel: [14623.808657] Modules linked in: cbc rbd libceph nfsd auth_rpcgss oid_registry nfs_acl nfs lockd sunrpc sha256_generic hmac nls_utf8 cifs dns_resolver fscache bridge stp llc xfs loop coretemp kvm_intel kvm crc32c_intel psmouse serio_raw snd_pcm snd_page_alloc snd_timer snd soundcore iTCO_wdt iTCO_vendor_support i2c_i801 i7core_edac microcode pcspkr lpc_ich mfd_core joydev ioatdma evdev edac_core acpi_cpufreq mperf button processor thermal_sys ext4 crc16 jbd2 mbcache btrfs xor zlib_deflate raid6_pq crc32c libcrc32c raid1 ohci_hcd hid_generic usbhid hid sr_mod sg cdrom sd_mod crc_t10dif dm_mod md_mod ata_generic ata_piix libata uhci_hcd ehci_pci ehci_hcd scsi_mod usbcore usb_common igb i2c_algo_bit i2c_core dca ptp pps_core Jul 25 15:20:46 **host** kernel: [14623.809005] CPU: 6 PID: 9583 Comm: mount Not tainted 3.10-1-amd64 #1 Debian 3.10.1-1 Jul 25 15:20:46 **host** kernel: [14623.809024] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1b 12/30/2011 Jul 25 15:20:46 **host** kernel: [14623.809041] task: ffff88082dfa2840 ti: ffff88080e2c2000 task.ti: ffff88080e2c2000 Jul 25 15:20:46 **host** kernel: [14623.809059] RIP: 0010:[<ffffffffa05d08ff>] [<ffffffffa05d08ff>] ceph_osdc_build_request+0x370/0x3e9 [libceph] Jul 25 15:20:46 **host** kernel: [14623.809092] RSP: 0018:ffff88080e2c39b8 EFLAGS: 00010216 Jul 25 15:20:46 **host** kernel: [14623.809120] RAX: ffff88082e589a80 RBX: ffff88082e589b72 RCX: 0000000000000007 Jul 25 15:20:46 **host** kernel: [14623.809151] RDX: ffff88082e589b6f RSI: ffff88082afd9078 RDI: ffff88082b308258 Jul 25 15:20:46 **host** kernel: [14623.809182] RBP: 0000000000001000 R08: ffff88082e10a400 R09: ffff88082afd9000 Jul 25 15:20:46 **host** kernel: [14623.809213] R10: ffff8806bfb1cd60 R11: ffff88082d153c01 R12: ffff88080e88e988 Jul 25 15:20:46 **host** kernel: [14623.809243] R13: 0000000000000001 R14: ffff88080eb874d8 R15: ffff88080eb875b8 Jul 25 15:20:46 **host** kernel: [14623.809275] FS: 00007f2c893b77e0(0000) GS:ffff88083fc40000(0000) knlGS:0000000000000000 Jul 25 15:20:46 **host** kernel: [14623.809322] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 25 15:20:46 **host** kernel: [14623.809350] CR2: ffffffffff600400 CR3: 00000006bfbc6000 CR4: 00000000000007e0 Jul 25 15:20:46 **host** kernel: [14623.809381] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 25 15:20:46 **host** kernel: [14623.809413] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 25 15:20:46 **host** kernel: [14623.809442] Stack: Jul 25 15:20:46 **host** kernel: [14623.814598] 0000000000002201 ffff88080e2c3a30 0000000000001000 ffff88042edf2240 Jul 25 15:20:46 **host** kernel: [14623.814656] 00000024a05cbb01 0000000000000000 ffff88082e84f348 ffff88080e2c3a58 Jul 25 15:20:46 **host** kernel: [14623.814710] ffff88080eb874d8 ffff88080e9aa290 ffff88027abc6000 0000000000001000 Jul 25 15:20:46 **host** kernel: [14623.814765] Call Trace: Jul 25 15:20:46 **host** kernel: [14623.814793] [<ffffffffa05bb7f3>] ? rbd_osd_req_format_write+0x81/0x8c [rbd] Jul 25 15:20:46 **host** kernel: [14623.814827] [<ffffffffa05bea1c>] ? rbd_img_request_fill+0x679/0x74f [rbd] Jul 25 15:20:46 **host** kernel: [14623.814865] [<ffffffff8105f670>] ? should_resched+0x5/0x23 Jul 25 15:20:46 **host** kernel: [14623.814896] [<ffffffffa05bf3d1>] ? rbd_request_fn+0x180/0x226 [rbd] Jul 25 15:20:46 **host** kernel: [14623.814929] [<ffffffff811a819c>] ? __blk_run_queue_uncond+0x1e/0x26 Jul 25 15:20:46 **host** kernel: [14623.814960] [<ffffffff811a905f>] ? blk_queue_bio+0x299/0x2e8 Jul 25 15:20:46 **host** kernel: [14623.814990] [<ffffffff811a7523>] ? generic_make_request+0x96/0xd5 Jul 25 15:20:46 **host** kernel: [14623.815021] [<ffffffff811a810f>] ? submit_bio+0x10a/0x13b Jul 25 15:20:46 **host** kernel: [14623.815053] [<ffffffff8112fe3d>] ? bio_alloc_bioset+0xd0/0x172 Jul 25 15:20:46 **host** kernel: [14623.815083] [<ffffffff8112d36a>] ? _submit_bh+0x1b7/0x1d4 Jul 25 15:20:46 **host** kernel: [14623.815117] [<ffffffff8112d4e9>] ? __sync_dirty_buffer+0x4e/0x7b Jul 25 15:20:46 **host** kernel: [14623.815164] [<ffffffffa03053b6>] ? ext4_commit_super+0x192/0x1db [ext4] Jul 25 15:20:46 **host** kernel: [14623.815206] [<ffffffffa0306cfe>] ? ext4_setup_super+0xff/0x146 [ext4] Jul 25 15:20:46 **host** kernel: [14623.815248] [<ffffffffa03094e2>] ? ext4_fill_super+0x1c55/0x2500 [ext4] Jul 25 15:20:46 **host** kernel: [14623.815282] [<ffffffff811c7194>] ? string.isra.3+0x36/0x99 Jul 25 15:20:46 **host** kernel: [14623.815322] [<ffffffffa030788d>] ? ext4_calculate_overhead+0x2a5/0x2a5 [ext4] Jul 25 15:20:46 **host** kernel: [14623.815371] [<ffffffff8110b721>] ? sget+0x460/0x478 Jul 25 15:20:46 **host** kernel: [14623.815410] [<ffffffffa030788d>] ? ext4_calculate_overhead+0x2a5/0x2a5 [ext4] Jul 25 15:20:46 **host** kernel: [14623.815457] [<ffffffff8110b8ed>] ? mount_bdev+0x143/0x1a5 Jul 25 15:20:46 **host** kernel: [14623.815490] [<ffffffff810f9857>] ? __kmalloc_track_caller+0xd5/0xe5 Jul 25 15:20:46 **host** kernel: [14623.815522] [<ffffffff8110c08d>] ? mount_fs+0x5f/0x140 Jul 25 15:20:46 **host** kernel: [14623.815554] [<ffffffff8111e70f>] ? vfs_kern_mount+0x60/0xe1 Jul 25 15:20:46 **host** kernel: [14623.815585] [<ffffffff8112078b>] ? do_mount+0x678/0x7f2 Jul 25 15:20:46 **host** kernel: [14623.815615] [<ffffffff810d47be>] ? memdup_user+0x36/0x5b Jul 25 15:20:46 **host** kernel: [14623.815645] [<ffffffff81120983>] ? SyS_mount+0x7e/0xb7 Jul 25 15:20:46 **host** kernel: [14623.815676] [<ffffffff813938e9>] ? system_call_fastpath+0x16/0x1b Jul 25 15:20:46 **host** kernel: [14623.815705] Code: 00 00 00 8b 54 24 28 66 89 50 22 49 8b 86 c0 00 00 00 8b 54 24 10 89 50 1e 49 8b 44 24 48 48 89 c2 49 03 54 24 50 48 39 d3 76 02 <0f> 0b 48 29 c3 49 89 5c 24 50 41 89 5c 24 16 eb 59 66 81 fd 01 Jul 25 15:20:46 **host** kernel: [14623.815934] RIP [<ffffffffa05d08ff>] ceph_osdc_build_request+0x370/0x3e9 [libceph] Jul 25 15:20:46 **host** kernel: [14623.815987] RSP <ffff88080e2c39b8> Jul 25 15:20:46 **host** kernel: [14623.816398] ---[ end trace 556a473d0b86002e ]--- It seems that if I rollback to the previous snapshot I can mount the image again, but I have to reboot the machine every time :'(
Thanks for the report, I don't think we've seen that before. I filed http://tracker.ceph.com/issues/5760 to track it. Josh _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com