this is the inodes number in the exported folder of the volume in the server before write file in the client: [root@cld-blu-13 nova]# du --inodes 2 . this is the used block: [root@cld-blu-13 nova]# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/mapper/nfsclustervg-nfsclusterlv xfs 1152878588 33000 1152845588 1% /nfscluster after write file in the client with umount/mount during writing: [root@cld-blu-13 nova]# du --inodes 3 . [root@cld-blu-13 nova]# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/mapper/nfsclustervg-nfsclusterlv xfs 1152878588 21004520 1131874068 2% /nfscluster thi is correct. now delete file: [root@cld-blu-13 nova]# du --inodes 2 . the number of the inodes is correct (from 3 to 2). [root@cld-blu-13 nova]# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/mapper/nfsclustervg-nfsclusterlv xfs 1152878588 21004520 1131874068 2% /nfscluster the number of used block is not correct. Do not return to initial value 33000 -----Messaggio originale----- Da: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] Per conto di J. Bruce Fields Inviato: martedì 12 maggio 2015 17.25 A: linux clustering Oggetto: Re: nfs cluster, problem with delete file in the failover case On Tue, May 12, 2015 at 12:37:10AM +0200, gianpietro.sella@xxxxxxxx wrote: > > On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella@xxxxxxxx wrote: > >> Hi, sorry for my bad english. > >> I testing nfs cluster active/passsive (2 nodes). > >> I use the next instruction for nfs: > >> > >> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm l/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.htm l > >> > >> I use centos 7.1 on the nodes. > >> The 2 node of the cluster share the same iscsi volume. > >> The nfs cluster is very good. > >> I have only one problem. > >> I mount the nfs cluster exported folder on my client node (nfsv3 > >> protocol). > >> I write on the nfs folder an big data file (70GB): > >> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat > >> Before write is finished I put the active node in standby status. > >> then the resource migrate in the other node. > >> when the dd write finish the file is ok. > >> I delete the file output.dat. > > > > So, the dd and the later rm are both run on the client, and the rm after > > the dd has completed and exited? And the rm doesn't happen till after > > the first migration is completely finished? What version of NFS are you > > using? > > > > It sounds like a sillyrename problem, but I don't see the explanation. > > > > --b. > > > Hi Bruce, thank for your answer. > yes the dd command and the rm command (all on the client node) finish > without error. > I use nfsv3, but is the same with nfsv4 protocol. > the s.o. is centos 7.1, the nfs package is nfs-utils-1.3.0-0.8.el7.x86_64. > the pacemaker configuration is: > > pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg > exclusive=true --group nfsclusterha > > pcs resource create nfsclusterdata Filesystem > device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster" > fstype="ext4" --group nfsclusterha > > pcs resource create nfsclusterserver nfsserver > nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group > nfsclusterha > > pcs resource create nfsclusterroot exportfs > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash > directory=/nfscluster/exports fsid=0 --group > nfsclusterha > > pcs resource create nfsclusternova exportfs > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash > directory=/nfscluster/exports/nova fsid=1 -- > group nfsclusterha > > pcs resource create nfsclusterglance exportfs > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash > directory=/nfscluster/exports/glance fsid= > 2 --group nfsclusterha > > pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180 cidr_netmask=24 > --group nfsclusterha > > pcs resource create nfsclusternotify nfsnotify source_host=192.168.61.180 > --group nfsclusterha > > now I have done the next test. > nfs cluster with 2 node. > the first node in standby state. > the second node in active state. > I mount the empty (not used space) exported volume in the client with nfsv3 > protocol (with nfs4 protocol is the same). > I write on the client an big file (70GB) in the mount directory with dd (but > is the same with cp command). > while the command write the file I disable nfsnotify, Iaddr2, exportfs and > nfsserver resource in this order (pcs resource disable ...) and next I > enable the resource (pcs resource enable ...) in the reverse order. > when disable resource writing freeze, when enable resource writing restart > without error. > when the writing command is finished I delete the file. > the mount directory is empty and the used space of exported volume is 0, > this is ok. > now i repead the test. > but now I disable/enable even the Filesystem resource: > disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource > (writing freeze) then enable in the reverse order (writing restart without > error). > when writing command is finished I delete the file. > now the mounted directory is empty (not file) but the used space is not 0 > but is 70GB. > this is not ok. > now I execute the next command on the active node of the cluster where the > volume is exported with nfs: > mount -o remount /dev/nfsclustervg/nfsclusterlv > where /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi volume > configured with lvm). > after this command the used space in the mounted directory of the client is > 0, this is ok. > I think that the problem is the Filesystem resource on the active node of > the cluster. > but is very strange. So, the only difference between the "good" and "bad" cases was the addition of the stop/start of the filesystem resource? I assume that's equivalent to an umount/mount. I guess the server's dentry for that file is hanging around for a little while for some reason. We've run across at least one problem of that sort before (see d891eedbc3b1 "fs/dcache: allow d_obtain_alias() to return unhashed dentries"). In both cases after the restart the first operation the server will get for that file is a write with a filehandle, and it will have to look up that filehandle to find the file. (Whereas without the restart the initial discovery of the file will be a lookup by name.) In the "good" case the server already has a dentry cached for that file, in the "bad" case the umount/mount means that we'll be doing a cold-cache lookup of that filehandle. I wonder if the test case can be narrowed down any further.... Is the large file necessary? If it's needed only to ensure the writes are actually sent to the server promptly then it might be enough to do the nfs mount with -osync. Instead of the cluster migration or restart, it might be possible to reproduce the bug just with a echo 2 >/proc/sys/vm/drop_caches run on the server side while the dd is in progress--I don't know if that will reliably drop the one dentry, though. Maybe do a few of those in a row. --b. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster