I dont hear a lot of people discuss using xfs_fsr on OSDs and going over the mailing list history it seems to have been brought up very infrequently and never as a suggestion for regular maintenance. Perhaps its not needed.
One thing to consider trying, and to rule out something funky with the XFS filesystem on that particular OSD/drive would be to remove the OSD entirely from the cluster, reformat the disk, and then rebuild the OSD, putting a brand new XFS on the OSD.
On Mon, Jan 15, 2018 at 7:36 AM, lists <lists@xxxxxxxxxxxxx> wrote:
Hi,
On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing slow requests on a specific OSD, during the the two-hour nightly xfs_fsr run from 05:00 - 07:00. This started after we applied the meltdown patches.
The specific osd.10 also has the highest space utilization of all OSDs cluster-wide, with 45%, while the others are mostly around 40%. All OSDs are the same 4TB platters with journal on ssd, all with weight 1.
Smart info for osd.10 shows nothing interesting I think:
Current Drive Temperature: 27 C
Drive Trip Temperature: 60 C
Manufactured in week 04 of year 2016
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 53
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 697
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 1933129649
Blocks received from initiator = 869206640
Blocks read from cache and sent to initiator = 2149311508
Number of read and write commands whose size <= segment size = 676356809
Number of read and write commands whose size > segment size = 12734900
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 13625.88
number of minutes until next internal SMART test = 8
Now my question:
Could it be that osd.10 just happens to contain some data chunks that are heavily needed by the VMs around that time, and that the added load of an xfs_fsr is simply too much for it to handle?
In that case, how about reweighting that osd.10 to "0", wait until all data has moved off osd.10, and then setting it back to "1". Would this result in *exactly* the same situation as before, or would it at least cause the data to have spread move better across the other OSDs?
(with the idea that better data spread across OSDs brings also better distribution of load between the OSDs)
Or other ideas to check out?
MJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Respectfully,
Wes Dillingham
Research Computing | Senior CyberInfrastructure Storage Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com