Starting on Friday, as part of adding a new pod of 12 servers, we initiated a reweight on roughly 384 drives; from 0.1 to 0.25. Something about the resulting large backfill is causing librbd to hang, requiring server restarts. The volumes are showing buffer i/o errors when this happens.We are currently using hybrid OSDs with both SSD and traditional spinning disks. The current status of the cluster is: ceph --version ceph version 14.2.22 Cluster Kernel 5.4.49-200 { "mon": { "ceph version 14.2.22 nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.22 nautilus (stable)": 3 }, "osd": { "ceph version 14.2.21 nautilus (stable)": 368, "ceph version 14.2.22 (stable)": 2055 }, "mds": {}, "rgw": { "ceph version 14.2.22 (stable)": 7 }, "overall": { "ceph version 14.2.21 (stable)": 368, "ceph version 14.2.22 (stable)": 2068 } } HEALTH_WARN, noscrub,nodeep-scrub flag(s) set. pgs: 6815703/11016906121 objects degraded (0.062%) 2814059622/11016906121 objects misplaced (25.543%). The client servers are on 3.10.0-1062.1.2.el7.x86_6 We have found a couple of issues that look relevant: https://tracker.ceph.com/issues/19385 https://tracker.ceph.com/issues/18807 Has anyone experienced anything like this before? Does anyone have any recommendations as to settings that can help alleviate this while the backfill completes? An example of the buffer ii/o errors: Jul 17 06:36:08 host8098 kernel: buffer_io_error: 22 callbacks suppressed Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 0, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical block 3, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-5, logical block 511984, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical block 3487657728, async page read Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical block 3487657729, async page read _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx