OSD bug - can fill up cluster to 100%

Nathan Cutler <ncutler@xxxxxxx> · Tue, 25 Oct 2016 16:23:21 +0200

We have a report that, due to an OSD bug, it is possible to fill a 
cluster (using "rados bench write") to the point that some OSDs reach 
100% full.

The issue is http://tracker.ceph.com/issues/16878

The report came from a cluster that had 24 OSDs, each 1 TB in size and 
with an 87 GB journal on an external partition.

I tried to reproduce this on a much smaller cluster (2 OSDs, 10 GB data 
partition, 1 GB journal partition) using teuthology-openstack. I was 
able to reliably fill this cluster to 98% (past the 95% FULL mark) when 
I constructed the journal partitions so they do not end on 2048-sector 
boundary.

(When the journals are "evenly sized", the cluster fills up to 95% and 
usage does not rise further.)

I would be grateful if someone who is much more familiar with the OSD 
code than I am (and that is not a difficult criterion to meet!) could 
look at the bug report - I have posted logs and a detailed analysis.

Thanks for your time!

--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html