Re: F_SETLK fails after recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 02, 2014 at 04:24:07PM +0000, Neale Ferguson wrote:
> In retrieve_plocks_stored() there is the code:
> 
>         retrieve_plocks(ls, &sig);
> 
>         if ((hd->flags & DLM_MFLG_PLOCK_SIG) && (sig != hd->msgdata2)) {
>                 log_error("lockspace %s plock disabled our sig %x "
>                           "nodeid %d sig %x", ls->name, sig, hd->nodeid,
>                           hd->msgdata2);
>                 ls->disable_plock = 1;
>                 ls->need_plocks = 1; /* don't set HAVEPLOCK */
>                 ls->save_plocks = 0;
>                 return;
>         }

We need to sort out which nodes are sending/receiving plock data to/from
each other.  The way it's supposed to work, is an existing node is
supposed to write its plock data into a checkpoint, then do
send_plocks_stored() to notify the new node that the data is ready.  The
new node is then supposed to receive_plocks_stored(), and read the plock
data from the checkpoint.

I could get a better picture if you save and send the output of
dlm_tool dump > dlm_dump.txt
dlm_tool log_plock > dlm_plock.txt

after the problem occurs.

> Node 1 is getting rc=12 from saCkptCheckpointOpen
> (SA_AIS_ERR_NOT_EXIST). However, this error is ignored and we process
> the sig value as if is valid rather than an uninitialized value that was
> never set by the retrieve_plocks() function. So I guess the question is
> why can't it find the checkpoint file and/or what is the correct action
> when the sig value cannot be retrieved?
> 
> Neale
> 
> On Sep 2, 2014, at 12:02 PM, Neale Ferguson <neale@xxxxxxxxxxxxxx> wrote:
> 
> > Thanks David,
> > That makes sense as there's this message that precedes the disable message in the log: 
> > 
> > retrieve_plocks ckpt open error 12 lvclusdidiz0360
> > 
> > Neale
> > 
> > On Sep 2, 2014, at 11:37 AM, David Teigland <teigland@xxxxxxxxxx> wrote:
> > 
> >> On Tue, Sep 02, 2014 at 02:56:52PM +0000, Neale Ferguson wrote:
> >> 
> >>> 1409631951 lockspace lvclusdidiz0360
> >>> plock disabled our sig 816fba01 nodeid 2 sig 2f6b
> >> 
> >> There is a difference in plock data signatures between the node that wrote
> >> the data and the node that read it (this one).  This indicates that the
> >> plock data was not synced correctly by the openais/corosync checkpoints,
> >> or that the signatures were not synced correctly (e.g bug 623816).

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux