On Mon, Jan 25, 2010 at 8:30 AM, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote: > On 25/01/10 13:28, Evan Broder wrote: >> >> On Mon, Jan 25, 2010 at 8:13 AM, Evan Broder<broder@xxxxxxx> wrote: >>> >>> On Mon, Jan 25, 2010 at 5:32 AM, Christine Caulfield >>> <ccaulfie@xxxxxxxxxx> wrote: >>>> >>>> On 23/01/10 17:35, Evan Broder wrote: >>>>> >>>>> On Fri, Jan 22, 2010 at 6:41 AM, Christine Caulfield >>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>> >>>>>> On 21/01/10 15:17, Evan Broder wrote: >>>>>>> >>>>>>> On Wed, Jan 13, 2010 at 4:59 AM, Christine Caulfield >>>>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> On 12/01/10 16:21, Evan Broder wrote: >>>>>>>>> >>>>>>>>> On Tue, Jan 12, 2010 at 3:54 AM, Christine Caulfield >>>>>>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> On 11/01/10 09:38, Christine Caulfield wrote: >>>>>>>>>>> >>>>>>>>>>> On 11/01/10 09:32, Evan Broder wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jan 11, 2010 at 4:03 AM, Christine Caulfield >>>>>>>>>>>> <ccaulfie@xxxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 08/01/10 22:58, Evan Broder wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> [please preserve the CC when replying, thanks] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi - >>>>>>>>>>>>>> We're attempting to setup a clvm (2.02.56) cluster using >>>>>>>>>>>>>> OpenAIS >>>>>>>>>>>>>> (1.1.1) and Corosync (1.1.2). We've gotten bitten hard in the >>>>>>>>>>>>>> past >>>>>>>>>>>>>> by >>>>>>>>>>>>>> crashes leaving DLM state around and forcing us to reboot our >>>>>>>>>>>>>> nodes, >>>>>>>>>>>>>> so we're specifically looking for a solution that doesn't >>>>>>>>>>>>>> involve >>>>>>>>>>>>>> in-kernel locking. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We're also running the Pacemaker OpenAIS service, as we're >>>>>>>>>>>>>> hoping >>>>>>>>>>>>>> to >>>>>>>>>>>>>> use it for management of some other resources going forward. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We've managed to form the OpenAIS cluster, and get clvmd >>>>>>>>>>>>>> running >>>>>>>>>>>>>> on >>>>>>>>>>>>>> both of our nodes. Operations using LVM succeed, so long as >>>>>>>>>>>>>> only >>>>>>>>>>>>>> one >>>>>>>>>>>>>> operation runs at a time. However, if we attempt to run two >>>>>>>>>>>>>> operations >>>>>>>>>>>>>> (say, one lvcreate on each host) at a time, they both hang, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> both >>>>>>>>>>>>>> clvmd processes appear to deadlock. >>>>>>>>>>>>>> >>>>>>>>>>>>>> When they deadlock, it doesn't appear to affect the other >>>>>>>>>>>>>> clustering >>>>>>>>>>>>>> processes - both corosync and pacemaker still report a fully >>>>>>>>>>>>>> formed >>>>>>>>>>>>>> cluster, so it seems the issue is localized to clvmd. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've looked at logs from corosync and pacemaker, and I've >>>>>>>>>>>>>> straced >>>>>>>>>>>>>> various processes, but I don't want to blast a bunch of >>>>>>>>>>>>>> useless >>>>>>>>>>>>>> information at the list. What information can I provide to >>>>>>>>>>>>>> make >>>>>>>>>>>>>> it >>>>>>>>>>>>>> easier to debug and fix this deadlock? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> To start with, the best logging to produce is the clvmd logs >>>>>>>>>>>>> which >>>>>>>>>>>>> can be >>>>>>>>>>>>> got with clvmd -d (see the man page for details). Ideally these >>>>>>>>>>>>> should be >>>>>>>>>>>>> from all nodes in the cluster so they can be correlated. If >>>>>>>>>>>>> you're >>>>>>>>>>>>> still >>>>>>>>>>>>> using DLM then a dlm lock dump from all nodes is often helpful >>>>>>>>>>>>> in >>>>>>>>>>>>> conjunction with the clvmd logs. >>>>>>>>>>>> >>>>>>>>>>>> Sure, no problem. I've posted the logs from clvmd on both >>>>>>>>>>>> processes >>>>>>>>>>>> in >>>>>>>>>>>> <http://web.mit.edu/broder/Public/clvmd/>. I've annotated them >>>>>>>>>>>> at a >>>>>>>>>>>> few points with what I was doing - the annotations all start >>>>>>>>>>>> with " >>>>>>>>>>>>>> >>>>>>>>>>>>>> ", so they should be easy to spot. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ironically it looks like a bug in the clvmd-openais code. I can >>>>>>>>>> reproduce >>>>>>>>>> it >>>>>>>>>> on my systems here. I don't see the problem when using the dlm! >>>>>>>>>> >>>>>>>>>> Can you try -Icorosync and see if that helps? In the meantime I'll >>>>>>>>>> have >>>>>>>>>> a >>>>>>>>>> look at the openais bits to try and find out what is wrong. >>>>>>>>>> >>>>>>>>>> Chrissie >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'll see what we can pull together, but the nodes running the clvm >>>>>>>>> cluster are also Xen dom0's. They're currently running on (Ubuntu >>>>>>>>> Hardy's) 2.6.24, so upgrading them to something new enough to >>>>>>>>> support >>>>>>>>> DLM 3 would be...challenging. >>>>>>>>> >>>>>>>>> It would be much, much better for us if we could get clvmd-openais >>>>>>>>> working. >>>>>>>>> >>>>>>>>> Is there any chance this would work better if we dropped back to >>>>>>>>> openais whitetank instead of corosync + openais wilson? >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> OK, I've found the bug and it IS in openais. The attached patch will >>>>>>>> fix >>>>>>>> it. >>>>>>>> >>>>>>>> Chrissie >>>>>>>> >>>>>>> >>>>>>> Awesome. That patch fixed our problem. >>>>>>> >>>>>>> We are running into one other problem - performing LVM operations on >>>>>>> one node is substantially slower than performing them on the other >>>>>>> node: >>>>>>> >>>>>>> root@black-mesa:~# time lvcreate -n test -L 1G xenvg >>>>>>> Logical volume "test" created >>>>>>> >>>>>>> real 0m0.309s >>>>>>> user 0m0.000s >>>>>>> sys 0m0.008s >>>>>>> root@black-mesa:~# time lvremove -f /dev/xenvg/test >>>>>>> Logical volume "test" successfully removed >>>>>>> >>>>>>> real 0m0.254s >>>>>>> user 0m0.004s >>>>>>> sys 0m0.008s >>>>>>> >>>>>>> >>>>>>> root@torchwood-institute:~# time lvcreate -n test -L 1G xenvg >>>>>>> Logical volume "test" created >>>>>>> >>>>>>> real 0m7.282s >>>>>>> user 0m6.396s >>>>>>> sys 0m0.312s >>>>>>> root@torchwood-institute:~# time lvremove -f /dev/xenvg/test >>>>>>> Logical volume "test" successfully removed >>>>>>> >>>>>>> real 0m7.277s >>>>>>> user 0m6.420s >>>>>>> sys 0m0.292s >>>>>>> >>>>>>> Any idea why this is happening and if there's anything we can do >>>>>>> about >>>>>>> it? >>>>>>> >>>>>> >>>>>> >>>>>> I'm not at all sure why that should be happening. I suppose the best >>>>>> thing >>>>>> to do would be to enable clvmd logging (clvmd -d) and see what is >>>>>> taking >>>>>> the >>>>>> time. >>>>>> >>>>>> Chrissie >>>>>> >>>>> >>>>> No problem. I've collected another set of logs - they're in >>>>> <http://web.mit.edu/broder/Public/clvmd-slow/>. >>>>> >>>>> After spinning up corosync and clvmd, the commands I ran were, in >>>>> order: >>>>> >>>>> root@black-mesa:~# vgchange -a y xenvg >>>>> 0 logical volume(s) in volume group "xenvg" now active >>>>> root@black-mesa:~# time lvcreate -n test1 -L 1G xenvg >>>>> Logical volume "test1" created >>>>> >>>>> real 0m0.685s >>>>> user 0m0.004s >>>>> sys 0m0.000s >>>>> root@black-mesa:~# time lvremove -f /dev/xenvg/test1 >>>>> Logical volume "test1" successfully removed >>>>> >>>>> real 0m0.235s >>>>> user 0m0.004s >>>>> sys 0m0.004s >>>>> root@torchwood-institute:~# time lvcreate -n test2 -L 1G xenvg >>>>> Logical volume "test2" created >>>>> >>>>> real 0m8.007s >>>>> user 0m6.396s >>>>> sys 0m0.312s >>>>> root@torchwood-institute:~# time lvremove -f /dev/xenvg/test2 >>>>> Logical volume "test2" successfully removed >>>>> >>>>> real 0m7.364s >>>>> user 0m6.436s >>>>> sys 0m0.300s >>>>> root@black-mesa:~# vgchange -a n xenvg >>>>> 0 logical volume(s) in volume group "xenvg" now active >>>>> >>>>> (black-mesa is node 1, and torchwood-institute is node 2) >>>>> >>>>> Thanks again for your help, >>>>> >>>> >>>> Hiya, >>>> >>>> Oddly I can't find any delays in the clvmd logs at all. There are some >>>> 7-second gaps in the log files but those are between commands coming >>>> from >>>> the lvm command-line and not internal to clvmd itself. >>>> >>>> What sort of storage are you using for these LVs? The only thing I can >>>> think >>>> of now is that there is some sort of delay in lvm opening the device for >>>> writing as it updates the metadata. An LVM debug log might be helpful >>>> here, >>>> though I'm not sure, off-hand, how to put time-stamps on that - I'm not >>>> really an LVM developer any more. >>>> >>>> Chrissie >>>> >>> >>> Well, uh, you see...at the moment I'm using a VM on a personal server >>> running iscsitarget from the other side of MIT's campus :) >>> >>> But that's only because our normal RAID is in the process of being >>> RMA'd. Normally it's a Fibre channel RAID sitting right next to the >>> two servers, and we still had the same timing problems (with the same >>> server being slower) then. The two servers are identical hardware and >>> sitting right next to each other. >>> >>> In any case, thanks for the suggestion to look at LVM itself instead >>> of clvmd - I sort of forgot my basic debugging techniques before >>> asking for help on this one. LVM logs to syslog, and its logging mode >>> includes timestamps. >>> >>> When I cranked up the debug level, I noticed that the lvcreates and >>> lvremoves were hanging on "Archiving volume group "xenvg" metadata >>> (seqno 48).". If I turn off the metadata archives in lvm.conf on >>> torchwood-institute, its performance goes back to what I'd expect, so >>> I guess it's having some sort of disk problems. >>> >>> Thanks again for all of our help, >>> - Evan >>> >> >> Actually, the real problem turned out to be much more straightforward: >> >> root@torchwood-institute:~# ls -a /etc/lvm/archive | wc -l >> 301369 >> >> For some reason, torchwood-institute had built up a massive archive of >> files named like >> /etc/lvm/archive/.lvm_torchwood-institute.mit.edu_17470_993238909. >> Listing them all took...about 7 seconds. >> >> Deleting them all put performance back where it should be. > > :-) > > Sometimes the simpler thing actually is the problem! > > clvmd is usually the first place I do look for a slowdown as it's > complicated and does a lot of locking. It's good that you found the problem > elsewhere. > > Thanks for letting me know. > > Chrissie > Hi again! Sorry to bother everyone again, but deadlocks on simultaneous LVM operations seem to have resurfaced somehow. As far as I know, we haven't made any changes, other than restarting the cluster a few times, but if I run root@black-mesa:~# lvcreate -n test1 -L 1G xenvg & lvcreate -n test2 -L 1G xenvg it seems to consistently deadlock. I should have logs of everything in <http://web.mit.edu/broder/Public/clvmd/node-1-syslog.20100125> and <http://web.mit.edu/broder/Public/clvmd/node-2-syslog.20100125> Thanks again! - Evan -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster