librbd hangs during large backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Starting on Friday, as part of adding a new pod of 12 servers, we initiated a reweight on roughly 384 drives; from 0.1 to 0.25. Something about the resulting large backfill is causing librbd to hang, requiring server restarts. The volumes are showing buffer i/o errors when this happens.We are currently using hybrid OSDs with both SSD and traditional spinning disks. The current status of the cluster is:
ceph --version
ceph version 14.2.22 
Cluster Kernel 5.4.49-200
{
	"mon": {
    	"ceph version 14.2.22 nautilus (stable)": 3
	},
	"mgr": {
    	"ceph version 14.2.22 nautilus (stable)": 3
	},
	"osd": {
    	"ceph version 14.2.21 nautilus (stable)": 368,
    	"ceph version 14.2.22 (stable)": 2055
	},
	"mds": {},
	"rgw": {
    	"ceph version 14.2.22 (stable)": 7
	},
	"overall": {
    	"ceph version 14.2.21 (stable)": 368,
    	"ceph version 14.2.22 (stable)": 2068
	}
}

HEALTH_WARN, noscrub,nodeep-scrub flag(s) set. 
pgs: 6815703/11016906121 objects degraded (0.062%) 2814059622/11016906121
 objects misplaced (25.543%). 

The client servers are on 3.10.0-1062.1.2.el7.x86_6

We have found a couple of issues that look relevant: 
https://tracker.ceph.com/issues/19385 
https://tracker.ceph.com/issues/18807 
Has anyone experienced anything like this before? Does anyone have any recommendations as to settings that can help alleviate this while the backfill completes? 
An example of the buffer ii/o errors:

Jul 17 06:36:08 host8098 kernel: buffer_io_error: 22 callbacks suppressed
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 3, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-5, logical block 511984, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical block 3487657728, async page read
Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical block 3487657729, async page read
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux