Blocked requests problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a Ceph Cluster with specifications below:
3 x Monitor node
6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD journals)
Distributed public and private networks. All NICs are 10Gbit/s
osd pool default size = 3
osd pool default min size = 2

Ceph version is Jewel 10.2.6.

My cluster is active and a lot of virtual machines running on it (Linux and Windows VM's, database clusters, web servers etc).

During normal use, cluster slowly went into a state of blocked requests. Blocked requests periodically incrementing. All OSD's seems healthy. Benchmark, iowait, network tests, all of them succeed.

Yerterday, 08:00:
$ ceph health detail
HEALTH_WARN 3 requests are blocked > 32 sec; 3 osds have slow requests
1 ops are blocked > 134218 sec on osd.31
1 ops are blocked > 134218 sec on osd.3
1 ops are blocked > 8388.61 sec on osd.29
3 osds have slow requests

Todat, 16:05:
$ ceph health detail
HEALTH_WARN 32 requests are blocked > 32 sec; 3 osds have slow requests
1 ops are blocked > 134218 sec on osd.31
1 ops are blocked > 134218 sec on osd.3
16 ops are blocked > 134218 sec on osd.29
11 ops are blocked > 67108.9 sec on osd.29
2 ops are blocked > 16777.2 sec on osd.29
1 ops are blocked > 8388.61 sec on osd.29
3 osds have slow requests

$ ceph pg dump | grep scrub
dumped all in format plain
pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state	state_stamp	v	reported	up	up_primary	acting	acting_primary	last_scrub	scrub_stamp	last_deep_scrub	deep_scrub_stamp
20.1e	25183	0	0	0	0	98332537930	3066	3066	active+clean+scrubbing	2017-08-21 04:55:13.354379	6930'23908781	6930:20905696	[29,31,3]	29	[29,31,3]	29	6712'22950171	2017-08-20 04:46:59.208792	6712'22950171	2017-08-20 04:46:59.208792

Active scrub does not finish (about 24 hours). I did not restart any OSD meanwhile.
I'm thinking set noscrub, noscrub-deep, norebalance, nobackfill, and norecover flags and restart 3,29,31th OSDs. Is this solve my problem? Or anyone has suggestion about this problem?

Thanks,
Ramazan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux