Hi all, Occasionally we see a bus glitch which causes a device to disappear then reappear with a new /dev/sd name. This crashes the osd (giving IO errors) but after a reboot the OSD will be perfectly fine. We're looking for a way to reeactivate osd like this without rebooting. For example, logs showing sdd disappear then reappear as sdq from this morning are in the P.S. We tried pvscan, vgscan, lvscan, but in all cases when trying to activate the osd we get an I/O error, as if the dm entry for the lv/vg is still referring to /dev/sdd. Is there some obvious way to properly tear down what refers to sdd / 0:0:3:0 so that we can activate sdq ? (In this case, we have already rebooted the box so I won't be able to test immediately.) Best Regards, Dan Mar 15 04:57:36 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0: device_block, handle(0x001c) Mar 15 04:57:38 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0: device_unblock and setting to running, handle(0x001c) Mar 15 04:57:38 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0: [sdd] Synchronizing SCSI cache Mar 15 04:57:38 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Stopping LVM event activation on device 8:48... Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x300062b2038af0c3) Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0: removing handle(0x001c), sas_addr(0x300062b2038af0c3) Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0: enclosure logical id(0x500062b2038af0c0), slot(1) Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0: enclosure level(0x0000), connector name( ) Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch lvm[1157119]: pvscan[1157119] device 8:48 not found. Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: lvm2-pvscan@8:48.service: Succeeded. Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Stopped LVM event activation on device 8:48. ... Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0: Direct-Access ATA Micron_5200_MTFD U020 PQ: 0 ANSI: 6 Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0: SATA: handle(0x001c), sas_addr(0x300062b2038af0c3), phy(3), device_name(0x0000000000000000) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0: enclosure logical id (0x500062b2038af0c0), slot(1) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0: enclosure level(0x0000), connector name( ) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: Power-on or device reset occurred Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: Attached scsi generic sg3 type 0 Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: end_device-0:16: add: handle(0x001c), sas_addr(0x300062b2038af0c3) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: [sdq] 1875385008 512-byte logical blocks: (960 GB/894 GiB) Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: [sdq] 4096-byte physical blocks Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: [sdq] Write Protect is off Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: [sdq] Write cache: enabled, read cache: enabled, supports DPO and FUA Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0: [sdq] Attached SCSI disk Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Starting LVM event activation on device 65:0... Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]: pvscan[1157327] PV /dev/sdq online, VG ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5 is complete. Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]: pvscan[1157327] VG ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5 run autoactivation. Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]: PVID qlETw1-KssL-Vc9P-MpAE-ED7I-gnWS-kO4DWF read from /dev/sdq last written to /dev/sdd. Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]: pvscan[1157327] VG ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5 not using quick activation. Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]: 1 logical volume(s) in volume group "ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5" now active Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Started LVM event activation on device 65:0. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx