Hi, I have a cluster that contains 16 OSDs spread over 4 physical machines. Each machines runs 4 OSD process. Among those, one isue periodically using 100% of the CPU. if you aggregate the total CPU time of the process over long periods, you can clearly see it uses roughtly 6x more CPU than any other of the all OSDs. The numbers for the other 15 OSDs (both on the same machine and on other machines) are quite consitent with one another. The PG distribution isn't ideal (some OSDs have more than others) but it's not bad either so there isn't one OSD having twice as much PGs as the other for example. I also ran a full SMART self-check on all the drives hosting OSD data but that didn't uncover anything. The logs (with default logging level) are not really showing anything abnormal either. The problem also seems to have been exacerbated by my recent update to Emperor (from Dumpling) this week end. For instance, here the CPU usage logs for the 16 OSDs during the last 6 months : http://i.imgur.com/cno73Ea.png The red line is osd.14 which is the problematic one. As you can see it recently "flared up" a lot but even before the update it was much higher than the other and rising which is a troubling trend. Any idea what this could be ? How can I isolate it and solve it ? Cheers, Sylvain _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com