OSD flapping during recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I had some issues with OSD flapping after 2 days of recovery.  It appears to be related to swapping, even though I have plenty of RAM for the number of OSDs I have.  The cluster was completely unusable, and I ended up rebooting all the nodes.  It's been great ever since, but I'm assuming it will happen again.

Details are below, but I'm wondering if anybody has any idea what happened?




I noticed some lumpy data distribution on my OSDs.  Following the advice on the mailling list, I increased the pg_num and pgp_num to the values from the formula.  .rgw.buckets is the only large pool, so I increased pg_num and pgp_num from 128 to 2048 on that one pool.  Cluster status changes to HEALTH_WARN, there were 1920 PGs with state active+remapped+wait_backfill, and 32% of the objects were degraded.

Recovery was slow, and we were having some performance issues.  I lowered osd_max_backfills from 10 to 2, and osd_recovery_op_priority from 10 to 2.  This didn't slow the recovery down much, but made my application much more responsive.  My journals are on the OSD disks (no SSDs).  I believe the osd_max_backfills was the more important change, but it's much slower to test than the osd_recovery_op_priority change.  Aside from those two, my notes say I changed and reverted osd_disk_threads, osd_op_threads, osd_recovery_threads.  All changes were pushed out using ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config set osd_max_backfills 2


I watched the cluster on and off over the weekend.  Ceph was steadily recovering.  It was down to ~900 PGs in active+remapped+wait_backfill, with 17%  of objects degraded.  A few OSDs have been marked down and recovered, so a few tens of PGs are in state active+degraded+remapped+wait_backfill and active+degraded+remapped+backfilling.  I was poking around, and I noticed kswapd was using betwen 5% and 30% CPU on all nodes.  It was bursty, peaking at 30% CPU usage for about 5sec out of every 30sec.  Swap usage wasn't increasing, and kswapd appeared to be doing a lot of nothing.  My machines have 8 OSDs, and 36GB of RAM.  top said that all machines were caching 30GB of data.  The 8 ceph-osd daemons are using 0.5GB to 1.2GB of RAM.  I don't have the exact numbers, but I believe they were using about 5GB for all 8 ceph-osd daemons.


A few hours later, and the OSDs really started flapping.  They're being voted unresponsive and marked down faster than they can rejoin.  At one point, a third of the OSDs were marked down.  ceph -w is complaining about hundreds of slow requests greater than 900 seconds.  Most RGW accesses are failing with HTTP timeouts.  kswapd is using a consistent 33% CPU on all nodes, with no variance that I can see.  To add insult, the cluster was running a scrub and a deep scrub.


I eventually rebooted all nodes in the cluster, one at a time.  Once quorum reestablished, recovery proceeded at the original speed.  The OSDs are responding, and all my RGW requests are returning in a reasonable amount of time.  There are no complaints of slow requests in ceph -w.  kswapd is using 0% of the CPU.


I'm running Ceph 0.72.2 on Ubuntu 12.04.4, with kernel 3.5.0-37-generic #58~precise1-Ubuntu SMP.

I monitor the running version as well as the installed version, so I know that all daemons were restarted after the 0.72.1 -> 0.72.2 upgrade.  That happened on Jan 22nd.



Any idea what happened?  I'm assuming it will happen again if recovery takes long enough.




--

Craig Lewis
Senior Systems Engineer
Office +1.714.602.1309
Email clewis@xxxxxxxxxxxxxxxxxx

Central Desktop. Work together in ways you never thought possible.
Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux