Hi, On Wed, 2011-01-26 at 10:59 -0700, Jim Schutt wrote: > Hi, > > I got this kernel BUG on a server running multiple Ceph > cosd instances, during a heavy write load generated by > multiple Ceph clients. > > The server was running the current ceph unstable kernel > (a3f5274e535 in git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git). > > Please let me know what other information you need to > make this report useful. > > -- Jim > Here's another example. Again, please let me know what other information you need to make this report useful. -- Jim [11199.532483] ------------[ cut here ]------------ [11199.536292] kernel BUG at fs/btrfs/extent-tree.c:2198! [11199.536292] invalid opcode: 0000 [#1] SMP [11199.536292] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map [11199.536292] CPU 3 [11199.536292] Modules linked in: loop btrfs zlib_deflate ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge ] [11199.536292] [11199.536292] Pid: 1664, comm: cosd Not tainted 2.6.37-00017-ga3f5274 #4 0DT097/PowerEdge 1950 [11199.536292] RIP: 0010:[<ffffffffa0774081>] [<ffffffffa0774081>] run_clustered_refs+0x71e/0x76b [btrfs] [11199.536292] RSP: 0018:ffff8801c90abb58 EFLAGS: 00010282 [11199.536292] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: ffff8802262c5000 [11199.536292] RDX: ffff88017921e2d0 RSI: ffffea000527f690 RDI: 0000000000000001 [11199.536292] RBP: ffff8801c90abc28 R08: ffffe8ffffccefe8 R09: 0000000000000000 [11199.536292] R10: 0000000000000003 R11: ffff880227549e98 R12: ffff880140bb8f00 [11199.536292] R13: 0000000000000000 R14: ffff880181eff378 R15: ffff8802262c5000 [11199.536292] FS: 00007f5e680fc940(0000) GS:ffff8800cfcc0000(0000) knlGS:0000000000000000 [11199.536292] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [11199.536292] CR2: 00007f0e1a476260 CR3: 0000000173aa0000 CR4: 00000000000006e0 [11199.536292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [11199.536292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [11199.536292] Process cosd (pid: 1664, threadinfo ffff8801c90aa000, task ffff8801df12d840) [11199.536292] Stack: [11199.536292] 0000000000000000 0000000000000000 0000000000000001 0000000000000000 [11199.536292] ffff8801c90abc48 ffff8802262c5000 ffff8801e0a9c600 ffff880181eff378 [11199.536292] 0000000000000000 0000002600000206 ffff880181eff380 000000007921e750 [11199.536292] Call Trace: [11199.536292] [<ffffffffa0785be0>] ? btrfs_update_inode+0xc3/0xd3 [btrfs] [11199.536292] [<ffffffffa07741bc>] btrfs_run_delayed_refs+0xee/0x15e [btrfs] [11199.536292] [<ffffffff810fa54d>] ? __fsnotify_update_dcache_flags+0x22/0x56 [11199.536292] [<ffffffffa07801d0>] __btrfs_end_transaction+0x6d/0x1e3 [btrfs] [11199.536292] [<ffffffffa0780372>] btrfs_end_transaction_throttle+0x18/0x1a [btrfs] [11199.536292] [<ffffffffa07872e1>] btrfs_create+0x1a0/0x1fa [btrfs] [11199.536292] [<ffffffff810f49e2>] vfs_create+0x76/0x96 [11199.536292] [<ffffffff810f56af>] do_last+0x24d/0x4d3 [11199.536292] [<ffffffff810f5b16>] do_filp_open+0x1e1/0x4c5 [11199.536292] [<ffffffff81031061>] ? should_resched+0xe/0x2f [11199.536292] [<ffffffff8136a638>] ? _cond_resched+0xe/0x22 [11199.536292] [<ffffffff811aa669>] ? might_fault+0xe/0x10 [11199.536292] [<ffffffff811aa753>] ? __strncpy_from_user+0x20/0x4a [11199.536292] [<ffffffff810e9023>] do_sys_open+0x62/0xeb [11199.536292] [<ffffffff810e90df>] sys_open+0x20/0x22 [11199.536292] [<ffffffff81002c2b>] system_call_fastpath+0x16/0x1b [11199.536292] Code: 24 08 48 8b 46 40 48 89 04 24 48 8b b5 58 ff ff ff 48 8b bd 60 ff ff ff e8 61 e7 ff ff eb 08 0f 0b eb fe 0f 0b eb fe 85 c0 74 04 <0f> 0b eb fe 4c 89 e7 e8 65 ae ff ff 48 8b bd 70 ff ff ff [11199.536292] RIP [<ffffffffa0774081>] run_clustered_refs+0x71e/0x76b [btrfs] [11199.536292] RSP <ffff8801c90abb58> [11199.905250] ---[ end trace b0dead1e7c3dbf7b ]--- Jan 26 11:40:32 an1 [11199.532483] ------------[ cut here ]------------ Jan 26 11:40:33 an1 [11199.536292] invalid opcode: 0000 [#1] SMP Jan 26 11:40:33 an1 [11199.536292] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Jan 26 11:40:38 an1 [11199.536292] Stack: Jan 26 11:40:38 an1 [11199.536292] Call Trace: Jan 26 11:40:40 an1 [11199.536292] Code: 24 08 48 8b 46 40 48 89 04 24 48 8b b5 58 ff ff ff 48 8b bd 60 ff ff ff e8 61 e7 ff ff eb 08 0f 0b eb fe 0f 0b eb fe 85 c0 74 04 <0f> 0b eb fe 4c 89 e7 e8 65 ae ff ff 4 [11212.699541] btrfs: sdm2 checksum verify failed on 31928320 wanted 237BEA0B found F7B13C5E level 0 [11212.709895] btrfs: sdm2 checksum verify failed on 31928320 wanted 237BEA0B found F7B13C5E level 0 [11212.719737] btrfs: sdm2 checksum verify failed on 31928320 wanted 237BEA0B found F7B13C5E level 0 [11212.729433] ------------[ cut here ]------------ [11212.730394] kernel BUG at fs/btrfs/extent-tree.c:5789! [11212.734157] invalid opcode: 0000 [#2] SMP [11212.734157] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map [11212.734157] CPU 3 [11212.734157] Modules linked in: loop btrfs zlib_deflate ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge ] [11212.734157] [11212.734157] Pid: 27662, comm: btrfs-cleaner Tainted: G D 2.6.37-00017-ga3f5274 #4 0DT097/PowerEdge 1950 [11212.734157] RIP: 0010:[<ffffffffa0773452>] [<ffffffffa0773452>] reada_walk_down+0x18c/0x249 [btrfs] [11212.734157] RSP: 0018:ffff880227539be0 EFLAGS: 00010282 [11212.734157] RAX: 00000000fffffffb RBX: ffff8801cd50d750 RCX: ffff88020b993000 [11212.734157] RDX: ffff88017921e3f0 RSI: ffffea000527f690 RDI: 0000000100000090 [11212.734157] RBP: ffff880227539c80 R08: ffffe8ffffccefe8 R09: 0000000000000000 [11212.734157] R10: 0000000100a68468 R11: ffff880227549e98 R12: ffff8801d83c3000 [11212.734157] R13: 0000000000000040 R14: ffff88020b993000 R15: 00000000000000e0 [11212.734157] FS: 0000000000000000(0000) GS:ffff8800cfcc0000(0000) knlGS:0000000000000000 [11212.734157] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [11212.734157] CR2: 0000000000b92de8 CR3: 000000020e5b3000 CR4: 00000000000006e0 [11212.734157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [11212.734157] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [11212.734157] Process btrfs-cleaner (pid: 27662, threadinfo ffff880227538000, task ffff88020ebc0000) [11212.734157] Stack: [11212.734157] ffff880227539bf0 0000000400000000 ffff8801cd50d750 ffff8801e0a9ca00 [11212.734157] 00000000024cd000 000010000000006b ffff88021527f880 0000000100000001 [11212.734157] ffff880227539c50 ffffffffa079c6bc ffff880225c96198 ffff8801b0cf9aa8 [11212.734157] Call Trace: [11212.734157] [<ffffffffa079c6bc>] ? extent_buffer_uptodate+0x6c/0x8a [btrfs] [11212.734157] [<ffffffffa0775d62>] do_walk_down+0x25b/0x395 [btrfs] [11212.734157] [<ffffffffa076db1f>] ? btrfs_header_generation+0x1f/0x25 [btrfs] [11212.734157] [<ffffffffa0771268>] ? walk_down_proc+0x10a/0x1d0 [btrfs] [11212.734157] [<ffffffffa0775f1d>] walk_down_tree+0x81/0xac [btrfs] [11212.734157] [<ffffffffa077636f>] btrfs_drop_snapshot+0x2aa/0x467 [btrfs] [11212.734157] [<ffffffff81031049>] ? need_resched+0x23/0x2d [11212.734157] [<ffffffff81031061>] ? should_resched+0xe/0x2f [11212.734157] [<ffffffffa077d080>] ? cleaner_kthread+0x0/0x16b [btrfs] [11212.734157] [<ffffffffa077f24d>] btrfs_clean_old_snapshots+0xee/0x10c [btrfs] [11212.734157] [<ffffffffa077d177>] cleaner_kthread+0xf7/0x16b [btrfs] [11212.734157] [<ffffffff8105b11e>] kthread+0x72/0x7a [11212.734157] [<ffffffff810039d4>] kernel_thread_helper+0x4/0x10 [11212.734157] [<ffffffff8105b0ac>] ? kthread+0x0/0x7a [11212.734157] [<ffffffff810039d0>] ? kernel_thread_helper+0x0/0x10 [11212.734157] Code: 01 00 00 0f 86 bb 00 00 00 8b 4d 8c 48 8b 55 80 4c 8d 4d c0 48 8b bd 78 ff ff ff 4c 8d 45 c8 4c 89 f6 e8 ec da ff ff 85 c0 74 04 <0f> 0b eb fe 48 8b 45 c8 48 85 c0 75 04 0f 0b eb fe 41 83 [11212.734157] RIP [<ffffffffa0773452>] reada_walk_down+0x18c/0x249 [btrfs] [11212.734157] RSP <ffff880227539be0> [11213.101484] ---[ end trace b0dead1e7c3dbf7c ]--- Jan 26 11:40:45 an1 [11212.729433] ------------[ cut here ]------------ Jan 26 11:40:45 an1 [11212.734157] invalid opcode: 0000 [#2] SMP Jan 26 11:40:45 an1 [11212.734157] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Jan 26 11:40:46 an1 [11212.734157] Stack: Jan 26 11:40:46 an1 [11212.734157] Call Trace: Jan 26 11:40:46 an1 [11212.734157] Code: 01 00 00 0f 86 bb 00 00 00 8b 4d 8c 48 8b 55 80 4c 8d 4d c0 48 8b bd 78 ff ff ff 4c 8d 45 c8 4c 89 f6 e8 ec da ff ff 85 c0 74 04 <0f> 0b eb fe 48 8b 45 c8 48 85 c0 75 0 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html