I think that first symptoms of out problems occurred when we posted this issue:
Regards
--
Jarek
--
Jarosław Owsiewski
2016-07-14 15:43 GMT+02:00 Jaroslaw Owsiewski <jaroslaw.owsiewski@xxxxxxxxxxxxxxxx>:
2016-07-14 15:26 GMT+02:00 Luis Periquito <periquito@xxxxxxxxx>:Hi Jaroslaw,
several things are springing up to mind. I'm assuming the cluster is
healthy (other than the slow requests), right?
Yes.From the (little) information you send it seems the pools are
replicated with size 3, is that correct?
True.Are there any long running delete processes? They usually have a
negative impact on performance, specially as they don't really show up
in the IOPS statistics.During normal troughput we have small amount of deletes.I've also something like this happen when there's a slow disk/osd. You
can try to check with "ceph osd perf" and look for higher numbers.
Usually restarting that OSD brings back the cluster to life, if that's
the issue.I will check this.If nothing shows, try a "ceph tell osd.* version"; if there's a
misbehaving OSD they usually don't respond to the command (slow or
even timing out).
Also you also don't say how many scrub/deep-scrub processes are
running. If not properly handled they are also a performance killer.
Scrub/deep-scrub processes are disabledLast, but by far not least, have you ever thought of creating a SSD
pool (even small) and move all pools but .rgw.buckets there? The other
ones are small enough, but enjoy having their own "reserved" osds...
This is one idea we had some time ago, we will try that.One important thing:sysop@s41617:~/bin$ ceph osd pool get .rgw.buckets pg_numpg_num: 4470sysop@s41617:~/bin$ ceph osd pool get .rgw.buckets.index pg_numpg_num: 2048Could be this a main problem?Regards--Jarek
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com