Re: slow requests on a specific osd

Wes Dillingham <wes_dillingham@xxxxxxxxxxx> · Mon, 15 Jan 2018 14:32:44 -0500

I dont hear a lot of people discuss using xfs_fsr on OSDs and going over the mailing list history it seems to have been brought up very infrequently and never as a suggestion for regular maintenance. Perhaps its not needed.
One thing to consider trying, and to rule out something funky with the XFS filesystem on that particular OSD/drive would be to remove the OSD entirely from the cluster, reformat the disk, and then rebuild the OSD, putting a brand new XFS on the OSD. 

On Mon, Jan 15, 2018 at 7:36 AM, lists <lists@xxxxxxxxxxxxx> wrote:
Hi,

On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing slow requests on a specific OSD, during the the two-hour nightly xfs_fsr run from 05:00 - 07:00. This started after we applied the meltdown patches.

The specific osd.10 also has the highest space utilization of all OSDs cluster-wide, with 45%, while the others are mostly around 40%. All OSDs are the same 4TB platters with journal on ssd, all with weight 1.

Smart info for osd.10 shows nothing interesting I think:

Current Drive Temperature:     27 C

Drive Trip Temperature:        60 C

Manufactured in week 04 of year 2016

Specified cycle count over device lifetime:  10000

Accumulated start-stop cycles:  53

Specified load-unload count over device lifetime:  300000

Accumulated load-unload cycles:  697

Elements in grown defect list: 0

Vendor (Seagate) cache information

  Blocks sent to initiator = 1933129649

  Blocks received from initiator = 869206640

  Blocks read from cache and sent to initiator = 2149311508

  Number of read and write commands whose size <= segment size = 676356809

  Number of read and write commands whose size > segment size = 12734900

Vendor (Seagate/Hitachi) factory information

  number of hours powered up = 13625.88

  number of minutes until next internal SMART test = 8

Now my question:

Could it be that osd.10 just happens to contain some data chunks that are heavily needed by the VMs around that time, and that the added load of an xfs_fsr is simply too much for it to handle?

In that case, how about reweighting that osd.10 to "0", wait until all data has moved off osd.10, and then setting it back to "1". Would this result in *exactly* the same situation as before, or would it at least cause the data to have spread move better across the other OSDs?

(with the idea that better data spread across OSDs brings also better distribution of load between the OSDs)

Or other ideas to check out?

MJ

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Respectfully,
Wes Dillingham
wes_dillingham@xxxxxxxxxxx
Research Computing | Senior CyberInfrastructure Storage Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com