On Tue, Sep 02, 2014 at 04:24:07PM +0000, Neale Ferguson wrote: > In retrieve_plocks_stored() there is the code: > > retrieve_plocks(ls, &sig); > > if ((hd->flags & DLM_MFLG_PLOCK_SIG) && (sig != hd->msgdata2)) { > log_error("lockspace %s plock disabled our sig %x " > "nodeid %d sig %x", ls->name, sig, hd->nodeid, > hd->msgdata2); > ls->disable_plock = 1; > ls->need_plocks = 1; /* don't set HAVEPLOCK */ > ls->save_plocks = 0; > return; > } We need to sort out which nodes are sending/receiving plock data to/from each other. The way it's supposed to work, is an existing node is supposed to write its plock data into a checkpoint, then do send_plocks_stored() to notify the new node that the data is ready. The new node is then supposed to receive_plocks_stored(), and read the plock data from the checkpoint. I could get a better picture if you save and send the output of dlm_tool dump > dlm_dump.txt dlm_tool log_plock > dlm_plock.txt after the problem occurs. > Node 1 is getting rc=12 from saCkptCheckpointOpen > (SA_AIS_ERR_NOT_EXIST). However, this error is ignored and we process > the sig value as if is valid rather than an uninitialized value that was > never set by the retrieve_plocks() function. So I guess the question is > why can't it find the checkpoint file and/or what is the correct action > when the sig value cannot be retrieved? > > Neale > > On Sep 2, 2014, at 12:02 PM, Neale Ferguson <neale@xxxxxxxxxxxxxx> wrote: > > > Thanks David, > > That makes sense as there's this message that precedes the disable message in the log: > > > > retrieve_plocks ckpt open error 12 lvclusdidiz0360 > > > > Neale > > > > On Sep 2, 2014, at 11:37 AM, David Teigland <teigland@xxxxxxxxxx> wrote: > > > >> On Tue, Sep 02, 2014 at 02:56:52PM +0000, Neale Ferguson wrote: > >> > >>> 1409631951 lockspace lvclusdidiz0360 > >>> plock disabled our sig 816fba01 nodeid 2 sig 2f6b > >> > >> There is a difference in plock data signatures between the node that wrote > >> the data and the node that read it (this one). This indicates that the > >> plock data was not synced correctly by the openais/corosync checkpoints, > >> or that the signatures were not synced correctly (e.g bug 623816). -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster