Re: RGW hung, 2 OSDs using 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
On 3/27/14 18:04 , Craig Lewis wrote:

I'm trying to use strace on osd.4:
strace -tt -f -ff -o ./ceph-osd.4.strace -x /usr/bin/ceph-osd --cluster=ceph -i 4 -f

So far, strace is running, and the process isn't hung.  After I ran this, the cluster finally finished backfilling the last of the PGs (all on osd.4).

Since the cluster is healthy again, I killed the strace, and started daemon normally (start ceph-osd id=4).  Things seem fine now.  I'm going to let it scrub and deepscrub overnight.  I'll restart radosgw-agent tomorrow.


This seems to have resolved the issue.  The cluster completed recovery while I was strace'ing osd.4, and hasn't had any issues since then.  I restarted radosgw-agent, and it's running fine.

I don't think the snapshots are related, but I don't know.  The snapshots I deleted were taken over a 2 week period, and covered an increase of 40% of the cluster data size.

The snapshot cron is still active, so I guess I'll repeat the experiment.  If the issue comes back in a couple weeks, I try the strace without removing the snapshots.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux