Re: [lvmlockd] recovery lvmlockd after kill_vg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you for your reply, I have another question under such circumstances.

I usually run "vgck" to check weather vg is good, but sometimes it
seems it stuck, and leave a VGLK on sanlock. (I'm sure io error will
cause it, but sometimes not because io error)
Then i'll try use sanlock client release -r xxx to release it, but it
also sometimes not work.(be stuck)
Then I may lvmlockctl -r to drop vg lockspace, but it still may stuck,
and I'm io is ok when it stuck

This usually happens on multipath storage, I consider multipath will
queue some io is blamed, but not sure.

Any idea?

Thanks for your reply again

Damon
On Wed, Sep 26, 2018 at 12:44 AM David Teigland <teigland@xxxxxxxxxx> wrote:
>
> On Tue, Sep 25, 2018 at 06:18:53PM +0800, Damon Wang wrote:
> > Hi,
> >
> >   AFAIK once sanlock can not access lease storage, it will run
> > "kill_vg" to lvmlockd, and the standard process should be deactivate
> > logical volumes and drop vg locks.
> >
> >   But sometimes the storage will recovery after kill_vg(and before we
> > deactivate or drop lock), and then it will prints "storage failed for
> > sanlock leases" on lvm commands like this:
> >
> > [root@dev1-2 ~]# vgck 71b1110c97bd48aaa25366e2dc11f65f
> >   WARNING: Not using lvmetad because config setting use_lvmetad=0.
> >   WARNING: To avoid corruption, rescan devices to make changes visible
> > (pvscan --cache).
> >   VG 71b1110c97bd48aaa25366e2dc11f65f lock skipped: storage failed for
> > sanlock leases
> >   Reading VG 71b1110c97bd48aaa25366e2dc11f65f without a lock.
> >
> >   so what should I do to recovery this, (better) without affect
> > volumes in using?
> >
> >   I find a way but it seems very tricky: save "lvmlockctl -i" output,
> > run lvmlockctl -r vg and then activate volumes as the previous output.
> >
> >   Do we have an "official" way to handle this? Since it is pretty
> > common that when I find lvmlockd failed, the storage has already
> > recovered.
>
> Hi, to figure out that workaround, you've probably already read the
> section of the lvmlockd man page: "sanlock lease storage failure", which
> gives some background about what's happening and why.  What the man page
> is missing is some help about false failure detections like you're seeing.
>
> It sounds like io delays from your storage are a little longer than
> sanlock is allowing for.  With the default 10 sec io timeout, sanlock will
> initiate recovery (kill_vg in lvmlockd) after 80 seconds of no successful
> io from the storage.  After this, it decides the storage has failed.  If
> it's not failed, just slow, then the proper way to handle that is to
> increase the timeouts.  (Or perhaps try to configure the storage to avoid
> such lengthy delays.)  Once a failure is detected and recovery is begun,
> there's not an official way to back out of it.
>
> You can increase the sanlock io timeout with lvmlockd -o <seconds>.
> sanlock multiplies that by 8 to get the total length of time before
> starting recovery.  I'd look at how long your temporary storage outages
> last and set io_timeout so that 8*io_timeout will cover it.
>
> Dave

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux