Re: ceph OSD with 95% full

Christian Balzer <chibi@xxxxxxx> · Tue, 19 Jul 2016 18:37:24 +0900

Hello,

On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:

> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
> >> One of the OSD is 95% full.
> >> If an OSD is 95% full, is it impact the any storage operation? Is this
> >> impacts on VM/Instance?
> 
> >Yes, one OSD will impact whole cluster. It will block write operations to the cluster
> 
> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
> block write I/O of ceph cluster?
>
Really.
To be more precise, any I/O that touches any PG on that OSD will block.
So with a sufficiently large cluster you may have some, few, I/Os still go
through as they don't use that OSD at all.

That's why:

1. Ceph has the near-full warning (which of course may need to be
adjusted to correctly reflect things, especially with smaller clusters).
Once you get that warning, you NEED to take action immediately. 

2. You want to graph the space utilization of all your OSDs with something
like graphite. That allows you to spot trends of uneven data distribution
early and thus react early to it.
I re-weight (CRUSH re-weight, as this is permanent and my clusters aren't
growing frequently) OSDs so they they are at least within 10% of each
other.

Christian
> Because I have around 251 OSDs out which one OSD is 95% full, but
> other 250 OSDs not in near full also...
> 
> Thanks
> Swami
> 
> 
> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote:
> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
> >>
> >> Hi,
> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
> >> One of the OSD is 95% full.
> >> If an OSD is 95% full, is it impact the any storage operation? Is this
> >> impacts on VM/Instance?
> >
> > Yes, one OSD will impact whole cluster. It will block write operations to
> > the cluster
> >>
> >> Immediately I have reduced the OSD weight, which was filled with 95 %
> >> data. After re-weight, data rebalanaced and OSD came to normal state
> >> (ie < 80%) with 1 hour time frame.
> >>
> >>
> >> Thanks
> >> Swami
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com