-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 We are having a lot of trouble with the SSD OSDs for our cache tier when they reboot. It causes massive blocked I/O when booting the OSD and the entire cluster I/O nearly stalls even when the OSD is only down for 60 seconds. I have noticed that when the OSD starts it uses massive amounts of RAM, for the one minute test it used almost 8 GB, another one earlier this morning used 14 GB, some last night were in the 10GB range. During this time the process is not using much CPU, but the disks are very busy writing a good 120-250 MB/s and hundreds to low thousand IOPs. Once the memory usage gets down to about 1.5 GB blocked I/O starts clearing slowly. At first I thought this was due to preloading jemalloc, but it also happens without it. Looking through [1] I thought osd recovery delay start set to 60 seconds or longer would allow the OSD to come up, join the cluster, do any housekeeping before being in and trying to service I/O requests. However, setting the value to 60 does nothing, we see recovery operations start less than 30 seconds after the monitor shows the boot message. The osd log does not show any kind of delay either. Is there a bug here or am I understanding this option incorrectly? What I'm looking for is something to delay any I/O until the peering is completed, the PGs have been scanned, all of the house keeping is done so that the only load on the OSD/disk is client/recovery I/O. I don't want it to try to do both at the same time. Once the OSD finally comes in and the blocked I/O clears, we can manage backfilling and recovery without much impact to the cluster, it is just the initial minutes of terror (dozens of blocked I/O > 500 seconds) that we can't figure out how to get rid of. I understand that there will be some impact for recovery, but on our cluster that on average does about 10K IOPs, we have less than 5K for 5 minutes (for a single OSD that was down for 60 seconds). A host with two SSDs brought our cluster to less than 2K IOPs for 15 minutes and took ten minutes to get back to normal performance. [1] http://docs.ceph.com/docs/v0.94/rados/configuration/osd-config-ref/#recovery Thanks, - ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.2.3 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJWS9EvCRDmVDuy+mK58QAAzE0P/3jQt3RkDUetTyuu/E3v wVwBtcxONs7RQHIEtamNk/eIoGsSS+PevsBK2hSvnIJWNZkhQ3U13HQQ7Hz1 awkVD3+nw72You09kC772MtAXOIcHDEQgzJHQGoxevLlJSRwIarzyMlkJqrP g+WdAx+O3BjtOoPG+6SG1HMDqUjTw46yHkCC2iybjT9y7PBp6PZ8EN1GD+00 k2+FferROKg/VxKxwQmgWVlXIvnrSF2/bHuZeTOUybw7TWNt1q6ZSXr4ZZuY 1e0yUnj8lNMus3SC6Itdj9wBp6Ke1J4tdUZkWiTgMkK5Xykw6iAJCADPIrni zck3SfI2XB8XXrNwvuEvuKyAleXAodPf/AbWQ9sfO88MoWFYZ3ibNbbIfAp4 SEKeZpipzxlvNCm/W2NiDD08jbcaYDqn6dj6fHSHvIelysRItLlojTXuAioZ ORQ4JAxPnEfNCUtn/eAq46oVIjrmSPiHs2p2hMYjhANLNYz5tyAt/HNSHXzR hnYH9y4TFIOyrB7JcAypkIKwiuGjmoMbR8RvF1hDEJRXAzj7rpePQ9FoNbU/ /uGIJlwSPEJ8UxK1TuqDJ13XvXfLbR+S1aPjt+Y5LOMYO0pFo4fsdtt8NakM ayRsUysGM9n7hBRTWHV5zPg7MB3wjcFv2kaE5NZfwfMNbM2xXkdClYHJ53Zr kbpr =BJTH -----END PGP SIGNATURE----- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com