slow requests on a specific osd

lists <lists@xxxxxxxxxxxxx> · Mon, 15 Jan 2018 13:36:40 +0100

Hi,

On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing 
slow requests on a specific OSD, during the the two-hour nightly xfs_fsr 
run from 05:00 - 07:00. This started after we applied the meltdown patches.

The specific osd.10 also has the highest space utilization of all OSDs 
cluster-wide, with 45%, while the others are mostly around 40%. All OSDs 
are the same 4TB platters with journal on ssd, all with weight 1.

Smart info for osd.10 shows nothing interesting I think:

Current Drive Temperature:     27 C
Drive Trip Temperature:        60 C

Manufactured in week 04 of year 2016
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  53
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  697
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 1933129649
  Blocks received from initiator = 869206640
  Blocks read from cache and sent to initiator = 2149311508
  Number of read and write commands whose size <= segment size = 676356809
  Number of read and write commands whose size > segment size = 12734900

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 13625.88
  number of minutes until next internal SMART test = 8

Now my question:
Could it be that osd.10 just happens to contain some data chunks that 
are heavily needed by the VMs around that time, and that the added load 
of an xfs_fsr is simply too much for it to handle?

In that case, how about reweighting that osd.10 to "0", wait until all 
data has moved off osd.10, and then setting it back to "1". Would this 
result in *exactly* the same situation as before, or would it at least 
cause the data to have spread move better across the other OSDs?

(with the idea that better data spread across OSDs brings also better 
distribution of load between the OSDs)

Or other ideas to check out?

MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com