Re: [lvmlockd] recovery lvmlockd after kill_vg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, Sep 28, 2018 at 1:35 AM David Teigland <teigland@xxxxxxxxxx> wrote:
>
> On Thu, Sep 27, 2018 at 10:12:44PM +0800, Damon Wang wrote:
> > Thank you for your reply, I have another question under such circumstances.
> >
> > I usually run "vgck" to check weather vg is good, but sometimes it
> > seems it stuck, and leave a VGLK on sanlock. (I'm sure io error will
> > cause it, but sometimes not because io error)
> > Then i'll try use sanlock client release -r xxx to release it, but it
> > also sometimes not work.(be stuck)
> > Then I may lvmlockctl -r to drop vg lockspace, but it still may stuck,
> > and I'm io is ok when it stuck
> >
> > This usually happens on multipath storage, I consider multipath will
> > queue some io is blamed, but not sure.
> >
> > Any idea?
>
> First, you might be able to avoid this issue by doing the check using
> something other than an lvm command, or perhaps and lvm command configured
> to avoid taking locks (the --nolocking option in vgs/pvs/lvs).  What's
> appropriate depends on specifically what you want to know from the check.
>

This is how I use sanlock and lvmlockd:


 +------------------+            +---------------------+         +----------------+
 |                  |            |                     |         |                |
 |     sanlock      <------------>     lvmlockd        <---------+  lvm commands  |
 |                  |            |                     |         |                |
 +------------------+            +---------------------+         +----------------+
       |
       |
       |
       |      +------------------+                               +-----------------+        +------------+
       |      |                  |                               |                 |        |            |
       +------>     multipath    <- - -  -  -  -   -  -  -  -  - |  lvm volumes    <--------+    qemu    |
              |                  |                               |                 |        |            |
              +------------------+                               +-----------------+        +------------+
                      |
                      |
                      |
                      |
                      |
              +------------------+
              |                  |
              |   san storage    |
              |                  |
              |                  |
              +------------------+

As I mentioned in first mail, sometimes I found lvm commands failed with "sanlock lease storage failure", I guess this is because lvmlockd kill_vg has triggered,
as the manual says, it should deactivate volumes and drop lockspace as quick as possible, but I can't get a proper alert from a program way.

TTY can get a message, but it's not a good way to listen or monitor, so I run vgck periodically and parse its stdout and stderr, once "sanlock lease storage failure" or 
something unusual happens, an alert will be triggered and I'll do some check(I hope all this process can be automatically).

If do not require lock(pvs/lvs/vgs --nolocking), these error wont be noticed, since lots of san storage configure multipath as queue io as far as possible(multipath -t | grep queue_if_no_path), 
get lvm error at early is pretty difficult, vgck and parse its output a way with less load(it will get a shared vglk) and better efficiency(it should take less than 0.1s in usual) after various tried.

As you mentioned, I'll extend io timeout to avoid storage jitter, and I believe it also resolves some problems from multipath queue io.


> I still haven't fixed the issue you found earlier, which sounds like it
> could be the same or related to what you're describing now.
> https://www.redhat.com/archives/linux-lvm/2018-July/msg00011.html
>
> As for manually cleaning up a stray lock using sanlock client, there may
> be some limits on the situations that works in, I don't recall off hand.
> You should try using the -p <pid> option with client release to match the
> pid of lvmlockd.


yes I added -p to release lock, and I wanna summary up an "Emergency Procedures" for deal with different storage failure, for me it's still unclear now.
I'll do more experiment after fix these annoying storage fails, then make this summary


> Configuring multipath to fail more quickly instead of queueing might give
> you a better chance of cleaning things up.
>
> Dave
>

yeah, I believe multipath queue io should be blamed, I'm negotiating with storage vendor since they think multipath config is right :-(


Thank you for your patience!

Damon
_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux