Re: OSD::disk_tp timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 9 Oct 2011, Martin Mailand wrote:
> Hi,
> I am using v3.1-rc9, so the fix in there. Maybe I can nail it down a bit more
> specific.

You might try sysrq-t or -w to see what the spinning CPUs are doing.

Thanks!
sage


> 
> Best Regards,
>  martin
> 
> Sage Weil schrieb:
> > Hi Christian,
> > 
> > On Sat, 8 Oct 2011, Christian Brunner wrote:
> > > Hi,
> > > 
> > > I've upgraded ceph from 0.32 to 0.36 yesterday. Now I have a totaly
> > > screwed ceph cluster. :(
> > > 
> > > What bugs me most is the fact, that OSDs become unresponsive
> > > frequently. The process is eating a lot of cpu and I can see the
> > 
> > What version of btrfs are you running?  This sound a bit like the bug fixed
> > by this patch:
> > 
> > http://www.spinics.net/lists/linux-btrfs/msg12627.html
> > 
> > (That was just merged into mainline this week.)
> > 
> > > following messages in the log:
> > > 
> > > Oct  8 22:30:05 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
> > > is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
> > > Oct  8 22:30:10 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
> > > is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
> > > Oct  8 22:30:15 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
> > > is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
> > > Oct  8 22:30:20 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
> > > is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
> > > Oct  8 22:30:25 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
> > > is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
> > > Oct  8 22:30:30 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
> > > is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
> > > 
> > > Do you have any idea, what to do about that?
> > 
> > Those messages just mean that a thread in the disk threadpool (which is
> > doing all the writes to btrfs) is blocked/stopped.
> > 
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux