On Tue, Sep 25, 2018 at 06:18:53PM +0800, Damon Wang wrote: > Hi, > > AFAIK once sanlock can not access lease storage, it will run > "kill_vg" to lvmlockd, and the standard process should be deactivate > logical volumes and drop vg locks. > > But sometimes the storage will recovery after kill_vg(and before we > deactivate or drop lock), and then it will prints "storage failed for > sanlock leases" on lvm commands like this: > > [root@dev1-2 ~]# vgck 71b1110c97bd48aaa25366e2dc11f65f > WARNING: Not using lvmetad because config setting use_lvmetad=0. > WARNING: To avoid corruption, rescan devices to make changes visible > (pvscan --cache). > VG 71b1110c97bd48aaa25366e2dc11f65f lock skipped: storage failed for > sanlock leases > Reading VG 71b1110c97bd48aaa25366e2dc11f65f without a lock. > > so what should I do to recovery this, (better) without affect > volumes in using? > > I find a way but it seems very tricky: save "lvmlockctl -i" output, > run lvmlockctl -r vg and then activate volumes as the previous output. > > Do we have an "official" way to handle this? Since it is pretty > common that when I find lvmlockd failed, the storage has already > recovered. Hi, to figure out that workaround, you've probably already read the section of the lvmlockd man page: "sanlock lease storage failure", which gives some background about what's happening and why. What the man page is missing is some help about false failure detections like you're seeing. It sounds like io delays from your storage are a little longer than sanlock is allowing for. With the default 10 sec io timeout, sanlock will initiate recovery (kill_vg in lvmlockd) after 80 seconds of no successful io from the storage. After this, it decides the storage has failed. If it's not failed, just slow, then the proper way to handle that is to increase the timeouts. (Or perhaps try to configure the storage to avoid such lengthy delays.) Once a failure is detected and recovery is begun, there's not an official way to back out of it. You can increase the sanlock io timeout with lvmlockd -o <seconds>. sanlock multiplies that by 8 to get the total length of time before starting recovery. I'd look at how long your temporary storage outages last and set io_timeout so that 8*io_timeout will cover it. Dave _______________________________________________ linux-lvm mailing list linux-lvm@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/