Hello Tommi, I followed your advice and tried with ext4. Everything is working fine with ext4. I'll try with a newer version of btrfs when I would have time. I paste below the trace related to btrfs which appears in var/log/messages during the problem : Sep 26 23:04:51 node95 kernel: INFO: task cosd:2988 blocked for more than 120 seconds. Sep 26 23:04:51 node95 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 26 23:04:51 node95 kernel: cosd D ffff880c3fc25700 0 2988 1 0x00000080 Sep 26 23:04:51 node95 kernel: ffff8817fb919bc8 0000000000000082 0000000000000000 ffffffffa0175fa1 Sep 26 23:04:51 node95 kernel: 0000000000000000 ffff8817f844f448 ffffffffa015df40 0000000100d0253c Sep 26 23:04:51 node95 kernel: ffff881806d05a98 ffff8817fb919fd8 0000000000010518 ffff881806d05a98 Sep 26 23:04:51 node95 kernel: Call Trace: Sep 26 23:04:51 node95 kernel: [<ffffffffa0175fa1>] ? extent_writepages+0x51/0x60 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffffa015df40>] ? btrfs_get_extent+0x0/0x8b0 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffffa016ef8d>] btrfs_start_ordered_extent+0x6d/0xc0 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Sep 26 23:04:51 node95 kernel: [<ffffffffa016f16b>] btrfs_wait_ordered_extents+0x12b/0x1e0 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffffa015336f>] btrfs_commit_transaction+0x20f/0x710 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Sep 26 23:04:51 node95 kernel: [<ffffffffa01801b6>] btrfs_mksubvol+0x2d6/0x350 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffffa0180343>] btrfs_ioctl_snap_create+0x113/0x160 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffffa0181d9a>] btrfs_ioctl+0x4ca/0x970 [btrfs] Sep 26 23:04:51 node95 kernel: [<ffffffff8117f182>] vfs_ioctl+0x22/0xa0 Sep 26 23:04:51 node95 kernel: [<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0 Sep 26 23:04:51 node95 kernel: [<ffffffff8117f324>] do_vfs_ioctl+0x84/0x580 Sep 26 23:04:51 node95 kernel: [<ffffffff8116c892>] ? vfs_write+0x132/0x1a0 Sep 26 23:04:51 node95 kernel: [<ffffffff8117f8a1>] sys_ioctl+0x81/0xa0 Sep 26 23:04:51 node95 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b Sep 26 23:06:51 node95 kernel: INFO: task btrfs-transacti:1093 blocked for more than 120 seconds. Sep 26 23:06:51 node95 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 26 23:06:51 node95 kernel: btrfs-transac D ffff880c3fc25700 0 1093 2 0x00000000 Sep 26 23:06:51 node95 kernel: ffff880c070e5d50 0000000000000046 0000000000000000 ffffffff81059d12 Sep 26 23:06:51 node95 kernel: 0000000000000000 0000000000016980 0000000000000000 0000000100d09a6f Sep 26 23:06:51 node95 kernel: ffff880c058fd028 ffff880c070e5fd8 0000000000010518 ffff880c058fd028 Sep 26 23:06:51 node95 kernel: Call Trace: Sep 26 23:06:51 node95 kernel: [<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0 Sep 26 23:06:51 node95 kernel: [<ffffffffa0151ec9>] wait_for_commit+0x89/0xf0 [btrfs] Sep 26 23:06:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Sep 26 23:06:51 node95 kernel: [<ffffffffa015374e>] btrfs_commit_transaction+0x5ee/0x710 [btrfs] Sep 26 23:06:51 node95 kernel: [<ffffffff814c963e>] ? mutex_lock+0x1e/0x50 Sep 26 23:06:51 node95 kernel: [<ffffffffa0153c8b>] ? start_transaction+0x1ab/0x230 [btrfs] Sep 26 23:06:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Sep 26 23:06:51 node95 kernel: [<ffffffffa014d9ab>] transaction_kthread+0x26b/0x280 [btrfs] Sep 26 23:06:51 node95 kernel: [<ffffffffa014d740>] ? transaction_kthread+0x0/0x280 [btrfs] Sep 26 23:06:51 node95 kernel: [<ffffffff81091936>] kthread+0x96/0xa0 Sep 26 23:06:51 node95 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20 Sep 26 23:06:51 node95 kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0 Sep 26 23:06:51 node95 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20 Regards Cédric ----- Mail original ----- > De: "Tommi Virtanen" <tommi.virtanen@xxxxxxxxxxxxx> > À: "Cédric Morandin" <cedric.morandin@xxxxxxxx> > Cc: "Wido den Hollander" <wido@xxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx > Envoyé: Mardi 27 Septembre 2011 18:32:24 > Objet: Re: Ceph hangs when accessed > On Mon, Sep 26, 2011 at 14:23, Cédric Morandin > <cedric.morandin@xxxxxxxx> wrote: > > 2011-09-26 23:07:49.404867 osd e13: 4 osds: 2 up, 4 in > ... > > 2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157 > > >> 138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0 > > l=0).accept connect_seq 2 vs existing 1 state 3 > > 2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed > > out after 600 seconds. > > ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6) > > And earlier you said the OSDs are using btrfs. That definitely sounds > like a btrfs bug, then. > > Do the osd machines have anything interesting in dmesg or > /var/log/kern.log ? > > You may want to try a newer kernel, or running on ext4 for now. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html