Hi. We hit an OSD_FULL last week on our cluster - with an average utillzation of less than 50% .. thus hugely imbalanced. This has driven us to go for adjusting pg's upwards and reweighting the osd's more agressively. Question: What do people see as an "acceptable" variance across OSD's? x <stdin> N Min Max Median Avg Stddev x 72 45.49 56.25 52.35 51.878889 2.1764343 72 x 10TB drives. It seems hard to get further down -- thus churn will most likely make it hard for us to stay at this level. Currently we have ~158 PGs / OSD .. which by my math gives 63GB/pg if they were fully utillzing the disk - which leads me to think that somewhat smaller pg's would give the balancing an easier job. Would to be ok to go to closer to 300 PGs/OSD ? - would it be sane? I can see that the default max is 300, but I have hard time finding out if this is "recommendable" or just a "tunable". * We've now seen OSD_FULL trigger irrecoverable kernel bugs on the CephFS kernel client on our 4.15 kernels - multiple times - forced reboot is the only way out. We're on the Ubuntu kernels .. I havent done the diff to upstream (yet) and I dont intent to run our production cluster disk-full anyware in the near future to test it out. Jesper _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com