Hi,
On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing
slow requests on a specific OSD, during the the two-hour nightly xfs_fsr
run from 05:00 - 07:00. This started after we applied the meltdown patches.
The specific osd.10 also has the highest space utilization of all OSDs
cluster-wide, with 45%, while the others are mostly around 40%. All OSDs
are the same 4TB platters with journal on ssd, all with weight 1.
Smart info for osd.10 shows nothing interesting I think:
Current Drive Temperature: 27 C
Drive Trip Temperature: 60 C
Manufactured in week 04 of year 2016
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 53
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 697
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 1933129649
Blocks received from initiator = 869206640
Blocks read from cache and sent to initiator = 2149311508
Number of read and write commands whose size <= segment size = 676356809
Number of read and write commands whose size > segment size = 12734900
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 13625.88
number of minutes until next internal SMART test = 8
Now my question:
Could it be that osd.10 just happens to contain some data chunks that
are heavily needed by the VMs around that time, and that the added load
of an xfs_fsr is simply too much for it to handle?
In that case, how about reweighting that osd.10 to "0", wait until all
data has moved off osd.10, and then setting it back to "1". Would this
result in *exactly* the same situation as before, or would it at least
cause the data to have spread move better across the other OSDs?
(with the idea that better data spread across OSDs brings also better
distribution of load between the OSDs)
Or other ideas to check out?
MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com