lvm fix for reseated reseated device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.

We're looking for a way to reeactivate osd like this without rebooting.

For example, logs showing sdd disappear then reappear as sdq from this
morning are in the P.S.

We tried pvscan, vgscan, lvscan, but in all cases when trying to
activate the osd we get an I/O error, as if the dm entry for the lv/vg
is still referring to /dev/sdd.

Is there some obvious way to properly tear down what refers to sdd /
0:0:3:0 so that we can activate sdq ?

(In this case, we have already rebooted the box so I won't be able to
test immediately.)

Best Regards, Dan

Mar 15 04:57:36 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0:
device_block, handle(0x001c)
Mar 15 04:57:38 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0:
device_unblock and setting to running, handle(0x001c)
Mar 15 04:57:38 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0:
[sdd] Synchronizing SCSI cache
Mar 15 04:57:38 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:3:0:
[sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Stopping
LVM event activation on device 8:48...
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0:
mpt3sas_transport_port_remove: removed: sas_addr(0x300062b2038af0c3)
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0:
removing handle(0x001c), sas_addr(0x300062b2038af0c3)
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0:
enclosure logical id(0x500062b2038af0c0), slot(1)
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch kernel: mpt3sas_cm0:
enclosure level(0x0000), connector name(     )
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch lvm[1157119]:
pvscan[1157119] device 8:48 not found.
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch systemd[1]:
lvm2-pvscan@8:48.service: Succeeded.
Mar 15 04:57:39 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Stopped
LVM event activation on device 8:48.
...
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0:
Direct-Access     ATA      Micron_5200_MTFD U020 PQ: 0 ANSI: 6
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0:
SATA: handle(0x001c), sas_addr(0x300062b2038af0c3), phy(3),
device_name(0x0000000000000000)
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0:
enclosure logical id (0x500062b2038af0c0), slot(1)
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0:
enclosure level(0x0000), connector name(     )
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0:
atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: scsi 0:0:16:0:
qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
Power-on or device reset occurred
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
Attached scsi generic sg3 type 0
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel:
end_device-0:16: add: handle(0x001c), sas_addr(0x300062b2038af0c3)
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
[sdq] 1875385008 512-byte logical blocks: (960 GB/894 GiB)
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
[sdq] 4096-byte physical blocks
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
[sdq] Write Protect is off
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
[sdq] Write cache: enabled, read cache: enabled, supports DPO and FUA
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch kernel: sd 0:0:16:0:
[sdq] Attached SCSI disk
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Starting
LVM event activation on device 65:0...
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]:
pvscan[1157327] PV /dev/sdq online, VG
ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5 is complete.
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]:
pvscan[1157327] VG ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5 run
autoactivation.
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]:  PVID
qlETw1-KssL-Vc9P-MpAE-ED7I-gnWS-kO4DWF read from /dev/sdq last written
to /dev/sdd.
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]:
pvscan[1157327] VG ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5 not using
quick activation.
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch lvm[1157327]:  1
logical volume(s) in volume group
"ceph-2b92ed55-2e7a-4aba-aab5-899b071eceb5" now active
Mar 15 04:58:16 cephflash21b-b3b91f0bb3.cern.ch systemd[1]: Started
LVM event activation on device 65:0.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux