I've seen this dynamic contribute to a hypervisor with many attachments running out of system-wide file descriptors. > On Jul 18, 2023, at 16:21, Konstantin Shalygin <k0ste@xxxxxxxx> wrote: > > Hi, > > Check you libvirt limits for qemu open files/sockets. Seems, when you added new OSD's, your librbd client limit reached > > > k > Sent from my iPhone > >> On 18 Jul 2023, at 19:32, Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> wrote: >> >> Did your automation / process allow for stalls in between changes to allow >> peering to complete? My hunch is you caused a very large peering storm >> (during peering a PG is inactive) which in turn caused your VMs to panic. >> If the RBDs are unmapped and re-mapped does it still continue to struggle? >> >> Respectfully, >> >> *Wes Dillingham* >> wes@xxxxxxxxxxxxxxxxx >> LinkedIn <http://www.linkedin.com/in/wesleydillingham> >> >> >>> On Tue, Jul 18, 2023 at 11:52 AM < >>> fb2cd0fc-933c-4cfe-b534-93d67045a088@xxxxxxxxxxxxxxx> wrote: >>> >>> Starting on Friday, as part of adding a new pod of 12 servers, we >>> initiated a reweight on roughly 384 drives; from 0.1 to 0.25. Something >>> about the resulting large backfill is causing librbd to hang, requiring >>> server restarts. The volumes are showing buffer i/o errors when this >>> happens.We are currently using hybrid OSDs with both SSD and traditional >>> spinning disks. The current status of the cluster is: >>> ceph --version >>> ceph version 14.2.22 >>> Cluster Kernel 5.4.49-200 >>> { >>> "mon": { >>> "ceph version 14.2.22 nautilus (stable)": 3 >>> }, >>> "mgr": { >>> "ceph version 14.2.22 nautilus (stable)": 3 >>> }, >>> "osd": { >>> "ceph version 14.2.21 nautilus (stable)": 368, >>> "ceph version 14.2.22 (stable)": 2055 >>> }, >>> "mds": {}, >>> "rgw": { >>> "ceph version 14.2.22 (stable)": 7 >>> }, >>> "overall": { >>> "ceph version 14.2.21 (stable)": 368, >>> "ceph version 14.2.22 (stable)": 2068 >>> } >>> } >>> >>> HEALTH_WARN, noscrub,nodeep-scrub flag(s) set. >>> pgs: 6815703/11016906121 objects degraded (0.062%) 2814059622/11016906121 >>> objects misplaced (25.543%). >>> >>> The client servers are on 3.10.0-1062.1.2.el7.x86_6 >>> >>> We have found a couple of issues that look relevant: >>> https://tracker.ceph.com/issues/19385 >>> https://tracker.ceph.com/issues/18807 >>> Has anyone experienced anything like this before? Does anyone have any >>> recommendations as to settings that can help alleviate this while the >>> backfill completes? >>> An example of the buffer ii/o errors: >>> >>> Jul 17 06:36:08 host8098 kernel: buffer_io_error: 22 callbacks suppressed >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 0, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 0, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 0, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 0, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 0, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 0, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical >>> block 3, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-5, logical >>> block 511984, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical >>> block 3487657728, async page read >>> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical >>> block 3487657729, async page read >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx