In the interest of removing variables,
I removed all snapshots on all pools, then restarted all ceph
daemons at the same time. This brought up osd.8 as well.
The cluster started recovering. Now osd.4 and osd.13 are doing
this.
Any suggestions for how I can see what the hung OSDs are doing?
The logs don't look interesting. Is there a higher log level I
can use?
I'm trying to use strace on osd.4:
strace -tt -f -ff -o ./ceph-osd.4.strace -x /usr/bin/ceph-osd
--cluster=ceph -i 4 -f
So far, strace is running, and the process isn't hung. After I
ran this, the cluster finally finished backfilling the last of the
PGs (all on osd.4).
Since the cluster is healthy again, I killed the strace, and
started daemon normally (start ceph-osd id=4). Things seem fine
now. I'm going to let it scrub and deepscrub overnight. I'll
restart radosgw-agent tomorrow.
On 3/27/14 10:44 , Craig Lewis wrote:
The
osd.8 log shows it doing some deep scrubbing here. Perhaps that
is
what caused your earlier issues with CPU usage?
When I first noticed the CPU usage, I checked iotop and iostat.
Both said there was no disk activity, on any OSD.
At 14:17:25, I ran radosgw-admin
--name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c
regionmap get.
regions list hung, and I killed At 14:18:15, I stopped
ceph-osd id=8.
At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c
regions
list && radosgw-admin --name=client.radosgw.ceph1c
regionmap get. It
returned successfully.
At 14:19:10, I stopped ceph-osd id=*/4/*.
Since you've got the noout flag set, when osd.8 goes down any
objects
for which osd.8 is the primary will not be readable. Since ceph
reads
from primaries, and the noout flag prevents another osd from
being
selected, which would happen if osd.8 were marked out, these
objects
(which apparently happen to include some needed for regions list
or
regionmap get) are inaccessible.
Josh
Taking osd.8 down (regardless of the noout flag) was the only way
to things to respond. I have not set nodown, just noout.
When I got in this morning, I had 4 more flapping OSDs: osd.4,
osd.12, osd.13, and osd.6. All 4 daemons were all using 100% CPU,
and no disk I/O.
osd.1 and osd.14 are the only ones currently using disk I/O.
There are 3 PGs being deepscrubbed:
root@ceph1c:/var/log/radosgw-agent# ceph pg dump | grep deep
dumped all in format plain
pg_stat objects mip degr unf bytes
log disklog state state_stamp v reported up
acting last_scrub scrub_stamp last_deep_scrub
deep_scrub_stamp
11.774 8682 0 0 0 7614655060 3001
3001 active+clean+scrubbing+deep 2014-03-27
10:20:30.598032 8381'5180514 8521:6520833 [13,4]
[13,4] 7894'5176984 2014-03-20 04:41:48.762996
7894'5176984 2014-03-20 04:41:48.762996
11.698 8587 0 0 0 7723737171 3001
3001 active+clean+scrubbing+deep 2014-03-27
10:16:31.292487 8383'483312 8521:618864 [14,1]
[14,1] 7894'479783 2014-03-20 03:53:18.024015
7894'479783 2014-03-20 03:53:18.024015
11.d8 8743 0 0 0 7570365909 3409
3409 active+clean+scrubbing+deep 2014-03-27
10:15:39.558121 8396'1753407 8521:2417672 [12,6]
[12,6] 7894'1459230 2014-03-20 02:40:22.123236
7894'1459230 2014-03-20 02:40:22.123236
These PGs are on the 6 OSDs mentioned. osd.1 and osd.14 are not
using 100% CPU and are using disk IO. osd.12, osd.6, osd.4, and
osd.13 are using 100% CPU, and 0 kB/s of disk IO. Here's iostat
on ceph0c, which contains osd.1 (/dev/sdd), osd.4 (/dev/sde), and
osd.6 (/dev/sdg):
root@ceph0c:/var/log/ceph# iostat -p sdd,sde,sdh 1
Linux 3.5.0-46-generic (ceph0c) 03/27/2014
_x86_64_ (8 CPU)
<snip>
avg-cpu: %user %nice %system %iowait %steal %idle
32.64 0.00 5.52 4.42 0.00 57.42
Device: tps kB_read/s kB_wrtn/s
kB_read kB_wrtn
sdd 113.00 900.00 0.00
900 0
sdd1 113.00 900.00 0.00
900 0
sde 0.00 0.00 0.00
0 0
sde1 0.00 0.00 0.00
0 0
sdh 0.00 0.00 0.00
0 0
sdh1 0.00 0.00 0.00
0 0
avg-cpu: %user %nice %system %iowait %steal %idle
29.90 0.00 4.41 2.82 0.00 62.87
Device: tps kB_read/s kB_wrtn/s
kB_read kB_wrtn
sdd 181.00 1332.00 0.00
1332 0
sdd1 181.00 1332.00 0.00
1332 0
sde 22.00 8.00 328.00
8 328
sde1 18.00 8.00 328.00
8 328
sdh 18.00 4.00 228.00
4 228
sdh1 15.00 4.00 228.00
4 228
avg-cpu: %user %nice %system %iowait %steal %idle
30.21 0.00 4.26 1.71 0.00 63.82
Device: tps kB_read/s kB_wrtn/s
kB_read kB_wrtn
sdd 180.00 1044.00 200.00
1044 200
sdd1 177.00 1044.00 200.00
1044 200
sde 0.00 0.00 0.00
0 0
sde1 0.00 0.00 0.00
0 0
sdh 0.00 0.00 0.00
0 0
sdh1 0.00 0.00 0.00
0 0
So it's not no disk activity, but it's pretty close. The disks
continue to have 0 kB_read and 0kB_wrtn for the next 60 seconds.
It's much lower than I would expect for OSDs executing a
deepscrub.
I restarted the 4 flapping OSDs. They recovered, then started
flapping within 5 minutes. I shut all of the ceph daemons down,
and rebooted all nodes at the same time. The OSDs return to 100%
CPU usage very soon after boot.
I was going to ask if I should zap osd.8 and re-add it to the
cluster. I don't think that's possible now.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|