OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

 

I’m trying to troubleshoot a strange issue with my Ceph cluster.

 

We’re Running Ceph Version 0.72.2

All Nodes are Dell R515’s w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS Drives and 2 x 100GB Intel DC S3700 SSD’s for Journals.

All Pools have a replica of 2 or better. I.e. metadata replica of 3.

 

I have 55 OSD’s in the cluster across 5 nodes. When I restart the OSD’s on a single node (any node) the load average of that node shoots up to 230+ and the whole cluster starts blocking IO requests until it settles down and its fine again.

 

Any ideas on why the load average goes so crazy & starts to block IO?

 

 

<snips from my ceph.conf>

[osd]

        osd data = "">

        osd journal size = 15000

        osd mkfs type = xfs

        osd mkfs options xfs = "-i size=2048 -f"

        osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k"

        osd max backfills = 5

        osd recovery max active = 3

 

[osd.0]

        host = pbnerbd01

        public addr = 10.100.96.10

        cluster addr = 10.100.128.10

        osd journal = /dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1

        devs = /dev/sda4

</end>

 

Thanks,

Quenten

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux