Re: Deadlock when using clvmd + OpenAIS + Corosync

Christine Caulfield <ccaulfie@xxxxxxxxxx> · Mon, 25 Jan 2010 10:32:05 +0000

On 23/01/10 17:35, Evan Broder wrote:
On Fri, Jan 22, 2010 at 6:41 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx>  wrote:
On 21/01/10 15:17, Evan Broder wrote:

On Wed, Jan 13, 2010 at 4:59 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx>    wrote:

On 12/01/10 16:21, Evan Broder wrote:

On Tue, Jan 12, 2010 at 3:54 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx>      wrote:

On 11/01/10 09:38, Christine Caulfield wrote:

On 11/01/10 09:32, Evan Broder wrote:

On Mon, Jan 11, 2010 at 4:03 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx>      wrote:

On 08/01/10 22:58, Evan Broder wrote:

[please preserve the CC when replying, thanks]

Hi -
We're attempting to setup a clvm (2.02.56) cluster using OpenAIS
(1.1.1) and Corosync (1.1.2). We've gotten bitten hard in the past
by
crashes leaving DLM state around and forcing us to reboot our
nodes,
so we're specifically looking for a solution that doesn't involve
in-kernel locking.

We're also running the Pacemaker OpenAIS service, as we're hoping
to
use it for management of some other resources going forward.

We've managed to form the OpenAIS cluster, and get clvmd running on
both of our nodes. Operations using LVM succeed, so long as only
one
operation runs at a time. However, if we attempt to run two
operations
(say, one lvcreate on each host) at a time, they both hang, and
both
clvmd processes appear to deadlock.

When they deadlock, it doesn't appear to affect the other
clustering
processes - both corosync and pacemaker still report a fully formed
cluster, so it seems the issue is localized to clvmd.

I've looked at logs from corosync and pacemaker, and I've straced
various processes, but I don't want to blast a bunch of useless
information at the list. What information can I provide to make it
easier to debug and fix this deadlock?

To start with, the best logging to produce is the clvmd logs which
can be
got with clvmd -d (see the man page for details). Ideally these
should be
from all nodes in the cluster so they can be correlated. If you're
still
using DLM then a dlm lock dump from all nodes is often helpful in
conjunction with the clvmd logs.

Sure, no problem. I've posted the logs from clvmd on both processes
in
<http://web.mit.edu/broder/Public/clvmd/>. I've annotated them at a
few points with what I was doing - the annotations all start with "

", so they should be easy to spot.

Ironically it looks like a bug in the clvmd-openais code. I can
reproduce
it
on my systems here. I don't see the problem when using the dlm!

Can you try -Icorosync and see if that helps? In the meantime I'll have
a
look at the openais bits to try and find out what is wrong.

Chrissie

I'll see what we can pull together, but the nodes running the clvm
cluster are also Xen dom0's. They're currently running on (Ubuntu
Hardy's) 2.6.24, so upgrading them to something new enough to support
DLM 3 would be...challenging.

It would be much, much better for us if we could get clvmd-openais
working.

Is there any chance this would work better if we dropped back to
openais whitetank instead of corosync + openais wilson?

OK, I've found the bug and it IS in openais. The attached patch will fix
it.

Chrissie

Awesome. That patch fixed our problem.

We are running into one other problem - performing LVM operations on
one node is substantially slower than performing them on the other
node:

root@black-mesa:~# time lvcreate -n test -L 1G xenvg
   Logical volume "test" created

real    0m0.309s
user    0m0.000s
sys     0m0.008s
root@black-mesa:~# time lvremove -f /dev/xenvg/test
   Logical volume "test" successfully removed

real    0m0.254s
user    0m0.004s
sys     0m0.008s

root@torchwood-institute:~# time lvcreate -n test -L 1G xenvg
   Logical volume "test" created

real    0m7.282s
user    0m6.396s
sys     0m0.312s
root@torchwood-institute:~# time lvremove -f /dev/xenvg/test
   Logical volume "test" successfully removed

real    0m7.277s
user    0m6.420s
sys     0m0.292s

Any idea why this is happening and if there's anything we can do about it?

I'm not at all sure why that should be happening. I suppose the best thing
to do would be to enable clvmd logging (clvmd -d) and see what is taking the
time.

Chrissie

No problem. I've collected another set of logs - they're in
<http://web.mit.edu/broder/Public/clvmd-slow/>.

After spinning up corosync and clvmd, the commands I ran were, in order:

   root@black-mesa:~# vgchange -a y xenvg
     0 logical volume(s) in volume group "xenvg" now active
   root@black-mesa:~# time lvcreate -n test1 -L 1G xenvg
     Logical volume "test1" created

   real    0m0.685s
   user    0m0.004s
   sys     0m0.000s
   root@black-mesa:~# time lvremove -f /dev/xenvg/test1
     Logical volume "test1" successfully removed

   real    0m0.235s
   user    0m0.004s
   sys     0m0.004s
   root@torchwood-institute:~# time lvcreate -n test2 -L 1G xenvg
     Logical volume "test2" created

   real    0m8.007s
   user    0m6.396s
   sys     0m0.312s
   root@torchwood-institute:~# time lvremove -f /dev/xenvg/test2
     Logical volume "test2" successfully removed

   real    0m7.364s
   user    0m6.436s
   sys     0m0.300s
   root@black-mesa:~# vgchange -a n xenvg
     0 logical volume(s) in volume group "xenvg" now active

(black-mesa is node 1, and torchwood-institute is node 2)

Thanks again for your help,

Hiya,

Oddly I can't find any delays in the clvmd logs at all. There are some 
7-second gaps in the log files but those are between commands coming 
from the lvm command-line and not internal to clvmd itself.

What sort of storage are you using for these LVs? The only thing I can 
think of now is that there is some sort of delay in lvm opening the 
device for writing as it updates the metadata. An LVM debug log might be 
helpful here, though I'm not sure, off-hand, how to put time-stamps on 
that - I'm not really an LVM developer any more.

Chrissie

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster