Re: RGW hung, 2 OSDs using 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In the interest of removing variables, I removed all snapshots on all pools, then restarted all ceph daemons at the same time.  This brought up osd.8 as well.

The cluster started recovering.  Now osd.4 and osd.13 are doing this.


Any suggestions for how I can see what the hung OSDs are doing?  The logs don't look interesting.  Is there a higher log level I can use?


I'm trying to use strace on osd.4:
strace -tt -f -ff -o ./ceph-osd.4.strace -x /usr/bin/ceph-osd --cluster=ceph -i 4 -f

So far, strace is running, and the process isn't hung.  After I ran this, the cluster finally finished backfilling the last of the PGs (all on osd.4).

Since the cluster is healthy again, I killed the strace, and started daemon normally (start ceph-osd id=4).  Things seem fine now.  I'm going to let it scrub and deepscrub overnight.  I'll restart radosgw-agent tomorrow.











Craig Lewis
Senior Systems Engineer
Office +1.714.602.1309
Email clewis@xxxxxxxxxxxxxxxxxx

Central Desktop. Work together in ways you never thought possible.
Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog

On 3/27/14 10:44 , Craig Lewis wrote:


The osd.8 log shows it doing some deep scrubbing here. Perhaps that is
what caused your earlier issues with CPU usage?
When I first noticed the CPU usage, I checked iotop and iostat.  Both said there was no disk activity, on any OSD.


At 14:17:25, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.
regions list hung, and I killed At 14:18:15, I stopped ceph-osd id=8.
At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.  It
returned successfully.
At 14:19:10, I stopped ceph-osd id=*/4/*.

Since you've got the noout flag set, when osd.8 goes down any objects
for which osd.8 is the primary will not be readable. Since ceph reads
from primaries, and the noout flag prevents another osd from being
selected, which would happen if osd.8 were marked out, these objects
(which apparently happen to include some needed for regions list or
regionmap get) are inaccessible.

Josh


Taking osd.8 down (regardless of the noout flag) was the only way to things to respond.  I have not set nodown, just noout.



When I got in this morning, I had 4 more flapping OSDs: osd.4, osd.12, osd.13, and osd.6.  All 4 daemons were all using 100% CPU, and no disk I/O.

osd.1 and osd.14 are the only ones currently using disk I/O.


There are 3 PGs being deepscrubbed:
root@ceph1c:/var/log/radosgw-agent# ceph pg dump | grep deep
dumped all in format plain
pg_stat    objects    mip    degr    unf    bytes    log    disklog    state    state_stamp    v    reported    up    acting    last_scrub    scrub_stamp    last_deep_scrub    deep_scrub_stamp
11.774    8682    0    0    0    7614655060    3001    3001    active+clean+scrubbing+deep    2014-03-27 10:20:30.598032    8381'5180514    8521:6520833    [13,4]    [13,4]    7894'5176984    2014-03-20 04:41:48.762996    7894'5176984    2014-03-20 04:41:48.762996
11.698    8587    0    0    0    7723737171    3001    3001    active+clean+scrubbing+deep    2014-03-27 10:16:31.292487    8383'483312    8521:618864    [14,1]    [14,1]    7894'479783    2014-03-20 03:53:18.024015    7894'479783    2014-03-20 03:53:18.024015
11.d8    8743    0    0    0    7570365909    3409    3409    active+clean+scrubbing+deep    2014-03-27 10:15:39.558121    8396'1753407    8521:2417672    [12,6]    [12,6]    7894'1459230    2014-03-20 02:40:22.123236    7894'1459230    2014-03-20 02:40:22.123236


These PGs are on the 6 OSDs mentioned.  osd.1 and osd.14 are not using 100% CPU and are using disk IO.  osd.12, osd.6, osd.4, and osd.13 are using 100% CPU, and 0 kB/s of disk IO.  Here's iostat on ceph0c, which contains osd.1 (/dev/sdd), osd.4 (/dev/sde), and osd.6 (/dev/sdg):
root@ceph0c:/var/log/ceph# iostat -p sdd,sde,sdh 1
Linux 3.5.0-46-generic (ceph0c)     03/27/2014     _x86_64_    (8 CPU)
<snip>
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          32.64    0.00    5.52    4.42    0.00   57.42

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdd             113.00       900.00         0.00        900          0
sdd1            113.00       900.00         0.00        900          0
sde               0.00         0.00         0.00          0          0
sde1              0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdh1              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.90    0.00    4.41    2.82    0.00   62.87

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdd             181.00      1332.00         0.00       1332          0
sdd1            181.00      1332.00         0.00       1332          0
sde              22.00         8.00       328.00          8        328
sde1             18.00         8.00       328.00          8        328
sdh              18.00         4.00       228.00          4        228
sdh1             15.00         4.00       228.00          4        228

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          30.21    0.00    4.26    1.71    0.00   63.82

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdd             180.00      1044.00       200.00       1044        200
sdd1            177.00      1044.00       200.00       1044        200
sde               0.00         0.00         0.00          0          0
sde1              0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdh1              0.00         0.00         0.00          0          0


So it's not no disk activity, but it's pretty close.  The disks continue to have 0 kB_read and 0kB_wrtn for the next 60 seconds.  It's much lower than I would expect for OSDs executing a deepscrub. 


I restarted the 4 flapping OSDs.  They recovered, then started flapping within 5 minutes.  I shut all of the ceph daemons down, and rebooted all nodes at the same time.  The OSDs return to 100% CPU usage very soon after boot.





I was going to ask if I should zap osd.8 and re-add it to the cluster.   I don't think that's possible now.



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux