On Sun, Nov 19, 2017 at 02:41:56AM PST, Gregory Farnum spake thusly: > Okay, so the hosts look okay (although very uneven numbers of OSDs). > > But the sizes are pretty wonky. Are the disks really that mismatched > in size? I note that many of them in host10 are set to 1.0, but most > of the others are some fraction less than that. Yes, they are that mismatched. This is a very mix and match cluster we built out of what we had laying around. I know that isn't ideal. Possibly due to the large mismatch in disk sizes (although I had always expected CRUSH to manage it batter given the default weighting proportional to size) we used to run into situations where the small disks would fill up even when the large disks were barely at 50%. So back in June we ran bc-ceph-reweight-by-utilization.py fairly frequently for a few days until things were happy and stable and it stayed that way until tonight's incident. I'm pretty sure you are right: The weights got reset to defaults causing lots of movement. I had forgotten that ceph osd reweight is not a persistent setting. So it looks like once things settle I need to adjust crush weights appropriately and set reweights back to 1 to make this permanent. That explains it. Thanks! -- Tracy Reed http://tracyreed.org Digital signature attached for your safety.
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com