Re: slow requests and short OSD failures in small cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I wouldn't set the default for osd_heartbeat_grace to 5 minutes, but inject it when you see this happening.  It's a good to know what your cluster is up to.  The fact that you aren't seeing the blocked requests any more tells me that this was your issue.  It will go through, split everything, go a while and then do it again months from now.

On Thu, Apr 13, 2017 at 4:43 AM Jogi Hofmüller <jogi@xxxxxx> wrote:
Dear David,

Am Mittwoch, den 12.04.2017, 13:46 +0000 schrieb David Turner:
> I can almost guarantee what you're seeing is PG subfolder splitting. 

Evey day there's something new to learn about ceph ;)

> When the subfolders in a PG get X number of objects, it splits into
> 16 subfolders.  Every cluster I manage has blocked requests and OSDs
> that get marked down while this is happening.  To stop the OSDs
> getting marked down, I increase the osd_heartbeat_grace until the
> OSDs no longer mark themselves down during this process.

Thanks for the hint. I adjusted the values accordingly and will monitor
our cluster. This morning there were no troubles at all btw. Still
wondering what caused yesterday's mayhem ...

Regards,
--
J.Hofmüller

           Nisiti
           - Abie Nathan, 1927-2008

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux