Re: High Load and High Apply Latency

John Petrini <jpetrini@xxxxxxxxxxxx> · Mon, 18 Dec 2017 15:22:48 -0500

Another strange thing I'm seeing is that two of the nodes in the cluster have some OSD's with almost no activity. If I watch top long enough I'll eventually see cpu utilization on these osds but for the most part they sit a 0% cpu utilization. I'm not sure if this is expected behavior or not though. I have another cluster running the same version of ceph that has the same symptom but the osds in our jewel cluster always show activity.

						John Petrini

						Platforms Engineer

												215.297.4400 x 232

												www.coredial.com

						751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422

						The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

On Mon, Dec 18, 2017 at 11:51 AM, John Petrini <jpetrini@xxxxxxxxxxxx> wrote:
Hi David,

Thanks for the info. The controller in the server (perc h730) was just replaced and the battery is at full health. Prior to replacing the controller I was seeing very high iowait when running iostat but I no longer see that behavior - just apply latency when running ceph osd perf. Since there's no iowait it makes me believe that the latency is not being introduced by the hardware; though I'm not ruling it out completely. I'd like to know what I can do to get a better understanding of what the OSD processes are so busy doing because they are working much harder on this server than the others.

On Thu, Dec 14, 2017 at 11:33 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
We show high disk latencies on a node when the controller's cache battery dies.  This is assuming that you're using a controller with cache enabled for your disks.  In any case, I would look at the hardware on the server.

On Thu, Dec 14, 2017 at 10:15 AM John Petrini <jpetrini@xxxxxxxxxxxx> wrote:
Anyone have any ideas on this?

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com