Re: RGW hung, 2 OSDs using 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/26/2014 05:50 PM, Craig Lewis wrote:
I made a typo in my timeline too.

It should read:
At 14:14:00, I started OSD 4, and waited for ceph-w to stabilize. CPU
usage was normal.
At 14:15:10, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.  It
returned successfully.
At 14:16:00, I started OSD 8, and waited for ceph -w to stabilize.  CPU
usage started out normal, but went to 100% before 14:16:40.

The osd.8 log shows it doing some deep scrubbing here. Perhaps that is
what caused your earlier issues with CPU usage?

At 14:17:25, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.
regions list hung, and I killed At 14:18:15, I stopped ceph-osd id=8.
At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.  It
returned successfully.
At 14:19:10, I stopped ceph-osd id=*/4/*.

Since you've got the noout flag set, when osd.8 goes down any objects
for which osd.8 is the primary will not be readable. Since ceph reads
from primaries, and the noout flag prevents another osd from being
selected, which would happen if osd.8 were marked out, these objects
(which apparently happen to include some needed for regions list or
regionmap get) are inaccessible.

Josh

Some newlines were added.  The only material change is the last line,
changing to id=4.

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis@xxxxxxxxxxxxxxxxxx <mailto:clewis@xxxxxxxxxxxxxxxxxx>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter
<http://www.twitter.com/centraldesktop>  | Facebook
<http://www.facebook.com/CentralDesktop>  | LinkedIn
<http://www.linkedin.com/groups?gid=147417>  | Blog
<http://cdblog.centraldesktop.com/>

On 3/26/14 15:04 , Craig Lewis wrote:
At 14:14:00, I started OSD 4, and waited for ceph-w to stabilize.  CPU
usage was normal.
At 14:15:10, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.  It
returned successfully.
At 14:16:00, I started OSD 8, and waited for ceph -w to stabilize.
CPU usage started out normal, but went to 100% before 14:16:40.
At 14:17:25, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.
regions list hung, and I killed At 14:18:15, I stopped ceph-osd id=8.
At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.  It
returned successfully.
At 14:19:10, I stopped ceph-osd id=8.



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux