On Mon, Jan 25, 2010 at 8:13 AM, Evan Broder <broder@xxxxxxx> wrote: > On Mon, Jan 25, 2010 at 5:32 AM, Christine Caulfield > <ccaulfie@xxxxxxxxxx> wrote: >> On 23/01/10 17:35, Evan Broder wrote: >>> >>> On Fri, Jan 22, 2010 at 6:41 AM, Christine Caulfield >>> <ccaulfie@xxxxxxxxxx> wrote: >>>> >>>> On 21/01/10 15:17, Evan Broder wrote: >>>>> >>>>> On Wed, Jan 13, 2010 at 4:59 AM, Christine Caulfield >>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>> >>>>>> On 12/01/10 16:21, Evan Broder wrote: >>>>>>> >>>>>>> On Tue, Jan 12, 2010 at 3:54 AM, Christine Caulfield >>>>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> On 11/01/10 09:38, Christine Caulfield wrote: >>>>>>>>> >>>>>>>>> On 11/01/10 09:32, Evan Broder wrote: >>>>>>>>>> >>>>>>>>>> On Mon, Jan 11, 2010 at 4:03 AM, Christine Caulfield >>>>>>>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> On 08/01/10 22:58, Evan Broder wrote: >>>>>>>>>>>> >>>>>>>>>>>> [please preserve the CC when replying, thanks] >>>>>>>>>>>> >>>>>>>>>>>> Hi - >>>>>>>>>>>> We're attempting to setup a clvm (2.02.56) cluster using OpenAIS >>>>>>>>>>>> (1.1.1) and Corosync (1.1.2). We've gotten bitten hard in the >>>>>>>>>>>> past >>>>>>>>>>>> by >>>>>>>>>>>> crashes leaving DLM state around and forcing us to reboot our >>>>>>>>>>>> nodes, >>>>>>>>>>>> so we're specifically looking for a solution that doesn't involve >>>>>>>>>>>> in-kernel locking. >>>>>>>>>>>> >>>>>>>>>>>> We're also running the Pacemaker OpenAIS service, as we're hoping >>>>>>>>>>>> to >>>>>>>>>>>> use it for management of some other resources going forward. >>>>>>>>>>>> >>>>>>>>>>>> We've managed to form the OpenAIS cluster, and get clvmd running >>>>>>>>>>>> on >>>>>>>>>>>> both of our nodes. Operations using LVM succeed, so long as only >>>>>>>>>>>> one >>>>>>>>>>>> operation runs at a time. However, if we attempt to run two >>>>>>>>>>>> operations >>>>>>>>>>>> (say, one lvcreate on each host) at a time, they both hang, and >>>>>>>>>>>> both >>>>>>>>>>>> clvmd processes appear to deadlock. >>>>>>>>>>>> >>>>>>>>>>>> When they deadlock, it doesn't appear to affect the other >>>>>>>>>>>> clustering >>>>>>>>>>>> processes - both corosync and pacemaker still report a fully >>>>>>>>>>>> formed >>>>>>>>>>>> cluster, so it seems the issue is localized to clvmd. >>>>>>>>>>>> >>>>>>>>>>>> I've looked at logs from corosync and pacemaker, and I've straced >>>>>>>>>>>> various processes, but I don't want to blast a bunch of useless >>>>>>>>>>>> information at the list. What information can I provide to make >>>>>>>>>>>> it >>>>>>>>>>>> easier to debug and fix this deadlock? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> To start with, the best logging to produce is the clvmd logs which >>>>>>>>>>> can be >>>>>>>>>>> got with clvmd -d (see the man page for details). Ideally these >>>>>>>>>>> should be >>>>>>>>>>> from all nodes in the cluster so they can be correlated. If you're >>>>>>>>>>> still >>>>>>>>>>> using DLM then a dlm lock dump from all nodes is often helpful in >>>>>>>>>>> conjunction with the clvmd logs. >>>>>>>>>> >>>>>>>>>> Sure, no problem. I've posted the logs from clvmd on both processes >>>>>>>>>> in >>>>>>>>>> <http://web.mit.edu/broder/Public/clvmd/>. I've annotated them at a >>>>>>>>>> few points with what I was doing - the annotations all start with " >>>>>>>>>>>> >>>>>>>>>>>> ", so they should be easy to spot. >>>>>>>> >>>>>>>> >>>>>>>> Ironically it looks like a bug in the clvmd-openais code. I can >>>>>>>> reproduce >>>>>>>> it >>>>>>>> on my systems here. I don't see the problem when using the dlm! >>>>>>>> >>>>>>>> Can you try -Icorosync and see if that helps? In the meantime I'll >>>>>>>> have >>>>>>>> a >>>>>>>> look at the openais bits to try and find out what is wrong. >>>>>>>> >>>>>>>> Chrissie >>>>>>>> >>>>>>> >>>>>>> I'll see what we can pull together, but the nodes running the clvm >>>>>>> cluster are also Xen dom0's. They're currently running on (Ubuntu >>>>>>> Hardy's) 2.6.24, so upgrading them to something new enough to support >>>>>>> DLM 3 would be...challenging. >>>>>>> >>>>>>> It would be much, much better for us if we could get clvmd-openais >>>>>>> working. >>>>>>> >>>>>>> Is there any chance this would work better if we dropped back to >>>>>>> openais whitetank instead of corosync + openais wilson? >>>>>>> >>>>>> >>>>>> >>>>>> OK, I've found the bug and it IS in openais. The attached patch will >>>>>> fix >>>>>> it. >>>>>> >>>>>> Chrissie >>>>>> >>>>> >>>>> Awesome. That patch fixed our problem. >>>>> >>>>> We are running into one other problem - performing LVM operations on >>>>> one node is substantially slower than performing them on the other >>>>> node: >>>>> >>>>> root@black-mesa:~# time lvcreate -n test -L 1G xenvg >>>>> Logical volume "test" created >>>>> >>>>> real 0m0.309s >>>>> user 0m0.000s >>>>> sys 0m0.008s >>>>> root@black-mesa:~# time lvremove -f /dev/xenvg/test >>>>> Logical volume "test" successfully removed >>>>> >>>>> real 0m0.254s >>>>> user 0m0.004s >>>>> sys 0m0.008s >>>>> >>>>> >>>>> root@torchwood-institute:~# time lvcreate -n test -L 1G xenvg >>>>> Logical volume "test" created >>>>> >>>>> real 0m7.282s >>>>> user 0m6.396s >>>>> sys 0m0.312s >>>>> root@torchwood-institute:~# time lvremove -f /dev/xenvg/test >>>>> Logical volume "test" successfully removed >>>>> >>>>> real 0m7.277s >>>>> user 0m6.420s >>>>> sys 0m0.292s >>>>> >>>>> Any idea why this is happening and if there's anything we can do about >>>>> it? >>>>> >>>> >>>> >>>> I'm not at all sure why that should be happening. I suppose the best >>>> thing >>>> to do would be to enable clvmd logging (clvmd -d) and see what is taking >>>> the >>>> time. >>>> >>>> Chrissie >>>> >>> >>> No problem. I've collected another set of logs - they're in >>> <http://web.mit.edu/broder/Public/clvmd-slow/>. >>> >>> After spinning up corosync and clvmd, the commands I ran were, in order: >>> >>> root@black-mesa:~# vgchange -a y xenvg >>> 0 logical volume(s) in volume group "xenvg" now active >>> root@black-mesa:~# time lvcreate -n test1 -L 1G xenvg >>> Logical volume "test1" created >>> >>> real 0m0.685s >>> user 0m0.004s >>> sys 0m0.000s >>> root@black-mesa:~# time lvremove -f /dev/xenvg/test1 >>> Logical volume "test1" successfully removed >>> >>> real 0m0.235s >>> user 0m0.004s >>> sys 0m0.004s >>> root@torchwood-institute:~# time lvcreate -n test2 -L 1G xenvg >>> Logical volume "test2" created >>> >>> real 0m8.007s >>> user 0m6.396s >>> sys 0m0.312s >>> root@torchwood-institute:~# time lvremove -f /dev/xenvg/test2 >>> Logical volume "test2" successfully removed >>> >>> real 0m7.364s >>> user 0m6.436s >>> sys 0m0.300s >>> root@black-mesa:~# vgchange -a n xenvg >>> 0 logical volume(s) in volume group "xenvg" now active >>> >>> (black-mesa is node 1, and torchwood-institute is node 2) >>> >>> Thanks again for your help, >>> >> >> Hiya, >> >> Oddly I can't find any delays in the clvmd logs at all. There are some >> 7-second gaps in the log files but those are between commands coming from >> the lvm command-line and not internal to clvmd itself. >> >> What sort of storage are you using for these LVs? The only thing I can think >> of now is that there is some sort of delay in lvm opening the device for >> writing as it updates the metadata. An LVM debug log might be helpful here, >> though I'm not sure, off-hand, how to put time-stamps on that - I'm not >> really an LVM developer any more. >> >> Chrissie >> > > Well, uh, you see...at the moment I'm using a VM on a personal server > running iscsitarget from the other side of MIT's campus :) > > But that's only because our normal RAID is in the process of being > RMA'd. Normally it's a Fibre channel RAID sitting right next to the > two servers, and we still had the same timing problems (with the same > server being slower) then. The two servers are identical hardware and > sitting right next to each other. > > In any case, thanks for the suggestion to look at LVM itself instead > of clvmd - I sort of forgot my basic debugging techniques before > asking for help on this one. LVM logs to syslog, and its logging mode > includes timestamps. > > When I cranked up the debug level, I noticed that the lvcreates and > lvremoves were hanging on "Archiving volume group "xenvg" metadata > (seqno 48).". If I turn off the metadata archives in lvm.conf on > torchwood-institute, its performance goes back to what I'd expect, so > I guess it's having some sort of disk problems. > > Thanks again for all of our help, > - Evan > Actually, the real problem turned out to be much more straightforward: root@torchwood-institute:~# ls -a /etc/lvm/archive | wc -l 301369 For some reason, torchwood-institute had built up a massive archive of files named like /etc/lvm/archive/.lvm_torchwood-institute.mit.edu_17470_993238909. Listing them all took...about 7 seconds. Deleting them all put performance back where it should be. - Evan -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster