Hi,
I am using v3.1-rc9, so the fix in there. Maybe I can nail it down a bit
more specific.
Best Regards,
martin
Sage Weil schrieb:
Hi Christian,
On Sat, 8 Oct 2011, Christian Brunner wrote:
Hi,
I've upgraded ceph from 0.32 to 0.36 yesterday. Now I have a totaly
screwed ceph cluster. :(
What bugs me most is the fact, that OSDs become unresponsive
frequently. The process is eating a lot of cpu and I can see the
What version of btrfs are you running? This sound a bit like the bug
fixed by this patch:
http://www.spinics.net/lists/linux-btrfs/msg12627.html
(That was just merged into mainline this week.)
following messages in the log:
Oct 8 22:30:05 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct 8 22:30:10 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct 8 22:30:15 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct 8 22:30:20 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct 8 22:30:25 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct 8 22:30:30 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Do you have any idea, what to do about that?
Those messages just mean that a thread in the disk threadpool (which is
doing all the writes to btrfs) is blocked/stopped.
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html