R: nfs cluster, problem with delete file in the failover case

"sella gianpietro" <gianpietro.sella@xxxxxxxx> · Wed, 13 May 2015 11:51:13 +0200

J. Bruce Fields <bfields <at> fieldses.org> writes:

> 
> On Tue, May 12, 2015 at 12:37:10AM +0200, gianpietro.sella <at> unipd.it
wrote:
> > > On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella <at>
unipd.it wrote:
> > >> Hi, sorry for my bad english.
> > >> I testing nfs cluster active/passsive (2 nodes).
> > >> I use the next instruction for nfs:
> > >>
> > >>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm
l/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.htm
l
> > >>
> > >> I use centos 7.1 on the nodes.
> > >> The 2 node of the cluster share the same iscsi volume.
> > >> The nfs cluster is very good.
> > >> I have only one problem.
> > >> I mount the nfs cluster exported folder on my client node (nfsv3
> > >> protocol).
> > >> I write on the nfs folder an big data file (70GB):
> > >> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
> > >> Before write is finished I put the active node in standby status.
> > >> then the resource migrate in the other node.
> > >> when the dd write finish the file is ok.
> > >> I delete the file output.dat.
> > >
> > > So, the dd and the later rm are both run on the client, and the rm
after
> > > the dd has completed and exited?  And the rm doesn't happen till after
> > > the first migration is completely finished?  What version of NFS are
you
> > > using?
> > >
> > > It sounds like a sillyrename problem, but I don't see the explanation.
> > >
> > > --b.
> > 
> > 
> > Hi Bruce, thank for your answer.
> > yes the dd command and the rm command (all on the client node) finish
> > without error.
> > I use nfsv3, but is the same with nfsv4 protocol.
> > the s.o. is centos 7.1, the nfs package is
nfs-utils-1.3.0-0.8.el7.x86_64.
> > the pacemaker configuration is:
> > 
> > pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg
> > exclusive=true --group nfsclusterha
> > 
> > pcs resource create nfsclusterdata Filesystem
> > device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster"
> > fstype="ext4" --group nfsclusterha
> > 
> > pcs resource create nfsclusterserver nfsserver
> > nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group
> > nfsclusterha
> > 
> > pcs resource create nfsclusterroot exportfs
> > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> > directory=/nfscluster/exports fsid=0 --group
> >  nfsclusterha
> > 
> > pcs resource create nfsclusternova exportfs
> > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> > directory=/nfscluster/exports/nova fsid=1 --
> > group nfsclusterha
> > 
> > pcs resource create nfsclusterglance exportfs
> > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> > directory=/nfscluster/exports/glance fsid=
> > 2 --group nfsclusterha
> > 
> > pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180
cidr_netmask=24
> > --group nfsclusterha
> > 
> > pcs resource create nfsclusternotify nfsnotify
source_host=192.168.61.180
> > --group nfsclusterha
> > 
> > now I have done the next test.
> > nfs cluster with 2 node.
> > the first node in standby state.
> > the second node in active state.
> > I mount the empty (not used space) exported volume in the client with
nfsv3
> > protocol (with nfs4 protocol is the same).
> > I write on the client an big file (70GB) in the mount directory with dd
(but
> > is the same with cp command).
> > while the command write the file I disable nfsnotify, Iaddr2, exportfs
and
> > nfsserver resource in this order (pcs resource disable ...) and next I
> > enable the resource (pcs resource enable ...) in the reverse order.
> > when disable resource writing freeze, when enable resource writing
restart
> > without error.
> > when the writing command is finished I delete the file.
> > the mount directory is empty and the used space of exported volume is 0,
> > this is ok.
> > now i repead the test.
> > but now I disable/enable even the Filesystem resource:
> > disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource
> > (writing freeze) then enable in the reverse order (writing restart
without
> > error).
> > when writing command is finished I delete the file.
> > now the mounted directory is empty (not file) but the used space is not
0
> > but is 70GB.
> > this is not ok.
> > now I execute the next command on the active node of the cluster where
the
> > volume is exported with nfs:
> > mount -o remount /dev/nfsclustervg/nfsclusterlv
> > where /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi
volume
> > configured with lvm).
> > after this command the used space in the mounted directory of the client
is
> > 0, this is ok.
> > I think that the problem is the Filesystem resource on the active node
of
> > the cluster.
> > but is very strange.
> 
> So, the only difference between the "good" and "bad" cases was the
> addition of the stop/start of the filesystem resource?  I assume that's
> equivalent to an umount/mount.

yes is correct

> 
> I guess the server's dentry for that file is hanging around for a little
> while for some reason.  We've run across at least one problem of that
> sort before (see d891eedbc3b1 "fs/dcache: allow d_obtain_alias() to
> return unhashed dentries").
> 
> In both cases after the restart the first operation the server will get
> for that file is a write with a filehandle, and it will have to look up
> that filehandle to find the file.  (Whereas without the restart the
> initial discovery of the file will be a lookup by name.)
> 
> In the "good" case the server already has a dentry cached for that file,
> in the "bad" case the umount/mount means that we'll be doing a
> cold-cache lookup of that filehandle.
> 
> I wonder if the test case can be narrowed down any further....  Is the
> large file necessary?  If it's needed only to ensure the writes are
> actually sent to the server promptly then it might be enough to do the
> nfs mount with -osync.

I use sync options, is the same problem

> 
> Instead of the cluster migration or restart, it might be possible to
> reproduce the bug just with a
> 
> 	echo 2 >/proc/sys/vm/drop_caches
> 
> run on the server side while the dd is in progress--I don't know if that
> will reliably drop the one dentry, though.  Maybe do a few of those in a
> row.

no, with echo command is not possible reproduce the problem

> 
> --b.
>

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster