Yes, that's expected behavior. Since the cluster can't move data around on its own, and lots of things will behave *very badly* if some of their writes go through but others don't, the cluster goes read-only once any OSD is full. That's why nearfull is a warn condition; you really want to even out the balance well before it gets to that point. A cluster at 65% and a single OSD at 95% is definitely *not* normal, so you seem to be doing something wrong or out of the ordinary. (A variance of 20% from fullest to emptiest isn't too unusual, but 30% from fullest to *average* definitely is.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Jul 18, 2014 at 3:15 PM, James Eckersall <james.eckersall at gmail.com> wrote: > Hi, > > I have a ceph cluster running on 0.80.1 with 80 OSD's. > > I've had fairly uneven distribution of the data and have been keeping it > ticking along with "ceph osd reweight XX 0.x" commands on a few OSD's while > I try and increase the pg count of the pools to hopefully better balance the > data. > > Tonight, one of the OSD's filled up to 95% so was marked as "full". > > This caused the cluster to be flagged as "full" and the server mapping the > rbd's hit a nice loadavg of over 800. This was rebooted and I was unable to > map any rbd's. > I've tweaked the reweight of the "full" OSD down and that is now "near > full". > As soon as that OSD changed state to "near full", the cluster changed status > to HEALTH_WARN and I'm able to map rbd's again. > > I was of the opinion that a full OSD would just prevent data from being > written to that OSD, not the near catastrophic cluster unavailability that > I've experienced. > > The cluster is around 65% full of data, so there is really plenty of space > across other OSD's. > > Can anyone please clarify exactly whether this behaviour is normal? > > Regards > > J > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >