Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 25 Jan 2018 8:43 am, "Lian, George (NSB - CN/Hangzhou)" <george.lian@xxxxxxxxxxxxxxx> wrote:
Hi,

I suppose the zero filled attr is for performance consider to NFS, but for fuse, it will lead issue such like hard LINK FOP,
So I suggest could we add 2 attr field in the endof "struct iatt {", such like ia_fuse_nlink, ia_fuse_ctime,
And in function gf_zero_fill_stat , saved the ia_nlink, ia_ctime to ia_fuse_nlink,ia_fuse_ctime before set its to zero,
And restore it to valued nlink and ctime in function gf_fuse_stat2attr,
So that kernel could get the correct nlink and ctime.

Is it a considerable solution? Any risk?

Please share your comments, thanks in advance!

Adding csaba for helping with this.


Best Regards,
George

-----Original Message-----
From: gluster-devel-bounces@gluster.org [mailto:gluster-devel-bounces@gluster.org] On Behalf Of Niels de Vos
Sent: Wednesday, January 24, 2018 7:43 PM
To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
Cc: Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>; Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
Subject: Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

On Wed, Jan 24, 2018 at 02:24:06PM +0530, Pranith Kumar Karampuri wrote:
> hi,
>        In the same commit you mentioned earlier, there was this code
> earlier:
> -/* Returns 1 if the stat seems to be filled with zeroes. */ -int
> -nfs_zero_filled_stat (struct iatt *buf) -{
> -        if (!buf)
> -                return 1;
> -
> -        /* Do not use st_dev because it is transformed to store the xlator
> id
> -         * in place of the device number. Do not use st_ino because by
> this time
> -         * we've already mapped the root ino to 1 so it is not guaranteed
> to be
> -         * 0.
> -         */
> -        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
> -                return 1;
> -
> -        return 0;
> -}
> -
> -
>
> I moved this to a common library function that can be used in afr as well.
> Why was it there in NFS? +Niels for answering that question.

Sorry, I dont know why that was done. It was introduced with the initial gNFS implementation, long before I started to work with Gluster. The only reference I have is this from
xlators/nfs/server/src/nfs3-helpers.c:nfs3_stat_to_post_op_attr()

 371         /* Some performance translators return zero-filled stats when they
 372          * do not have up-to-date attributes. Need to handle this by not
 373          * returning these zeroed out attrs.
 374          */

This may not be true for the current situation anymore.

HTH,
Niels


>
> If I give you a patch which will assert the error condition, would it
> be possible for you to figure out the first xlator which is unwinding
> the iatt with nlink count as zero but ctime as non-zero?
>
> On Wed, Jan 24, 2018 at 1:03 PM, Lian, George (NSB - CN/Hangzhou) <
> george.lian@xxxxxxxxxxxxxxx> wrote:
>
> > Hi,  Pranith Kumar,
> >
> >
> >
> > Can you tell me while need set buf->ia_nlink to “0”in function
> > gf_zero_fill_stat(), which API or Application will care it?
> >
> > If I remove this line and also update corresponding in function
> > gf_is_zero_filled_stat,
> >
> > The issue seems gone, but I can’t confirm will lead to other issues.
> >
> >
> >
> > So could you please double check it and give your comments?
> >
> >
> >
> > My change is as the below:
> >
> >
> >
> > gf_boolean_t
> >
> > gf_is_zero_filled_stat (struct iatt *buf)
> >
> > {
> >
> >         if (!buf)
> >
> >                 return 1;
> >
> >
> >
> >         /* Do not use st_dev because it is transformed to store the
> > xlator id
> >
> >          * in place of the device number. Do not use st_ino because
> > by this time
> >
> >          * we've already mapped the root ino to 1 so it is not
> > guaranteed to be
> >
> >          * 0.
> >
> >          */
> >
> > //        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
> >
> >         if (buf->ia_ctime == )
> >
> >                 return 1;
> >
> >
> >
> >         return 0;
> >
> > }
> >
> >
> >
> > void
> >
> > gf_zero_fill_stat (struct iatt *buf)
> >
> > {
> >
> > //       buf->ia_nlink = 0;
> >
> >         buf->ia_ctime = 0;
> >
> > }
> >
> >
> >
> > Thanks & Best Regards
> >
> > George
> >
> > *From:* Lian, George (NSB - CN/Hangzhou)
> > *Sent:* Friday, January 19, 2018 10:03 AM
> > *To:* Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>; Zhou, Cynthia
> > (NSB -
> > CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>
> > *Cc:* Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>;
> > Gluster-devel@xxxxxxxxxxx; Sun, Ping (NSB - CN/Hangzhou) <
> > ping.sun@xxxxxxxxxxxxxxx>
> >
> > *Subject:* RE: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> > Hi,
> >
> > >>> Cool, this works for me too. Send me a mail off-list once you
> > >>> are
> > available and we can figure out a way to get into a call and work on this.
> >
> >
> >
> > Have you reproduced the issue per the step I listed in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail?
> >
> >
> >
> > If not, I would like you could try it yourself , which the
> > difference between yours and mine is just create only 2 bricks instead of 6 bricks.
> >
> >
> >
> > And Cynthia could have a session with you if you needed when I am
> > not available in next Monday and Tuesday.
> >
> >
> >
> > Thanks & Best Regards,
> >
> > George
> >
> >
> >
> > *From:* gluster-devel-bounces@gluster.org
> > [mailto:gluster-devel-bounces@ gluster.org
> > <gluster-devel-bounces@gluster.org>] *On Behalf Of *Pranith Kumar
> > Karampuri
> > *Sent:* Thursday, January 18, 2018 6:03 PM
> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)
> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou)
> > <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping
> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>
> > *Subject:* Re: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) <
> > george.lian@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > >>>I actually tried it with replica-2 and replica-3 and then
> > >>>distributed
> > replica-2 before replying to the earlier mail. We can have a
> > debugging session if you are okay with it.
> >
> >
> >
> > It is fine if you can’t reproduce the issue in your ENV.
> >
> > And I has attached the detail reproduce log in the Bugzilla FYI
> >
> >
> >
> > But I am sorry I maybe OOO at Monday and Tuesday next week, so debug
> > session will be fine to me at next Wednesday.
> >
> >
> >
> > Cool, this works for me too. Send me a mail off-list once you are
> > available and we can figure out a way to get into a call and work on this.
> >
> >
> >
> >
> >
> >
> >
> > Paste the detail reproduce log FYI here:
> >
> > *root@ubuntu:~# gluster peer probe ubuntu*
> >
> > *peer probe: success. Probe on localhost not needed*
> >
> > *root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1
> > ubuntu:/home/gfs/b2 force*
> >
> > *volume create: test: success: please start the volume to access
> > data*
> >
> > *root@ubuntu:~# gluster v start test*
> >
> > *volume start: test: success*
> >
> > *root@ubuntu:~# gluster v info test*
> >
> >
> >
> > *Volume Name: test*
> >
> > *Type: Replicate*
> >
> > *Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5*
> >
> > *Status: Started*
> >
> > *Snapshot Count: 0*
> >
> > *Number of Bricks: 1 x 2 = 2*
> >
> > *Transport-type: tcp*
> >
> > *Bricks:*
> >
> > *Brick1: ubuntu:/home/gfs/b1*
> >
> > *Brick2: ubuntu:/home/gfs/b2*
> >
> > *Options Reconfigured:*
> >
> > *transport.address-family: inet*
> >
> > *nfs.disable: on*
> >
> > *performance.client-io-threads: off*
> >
> > *root@ubuntu:~# gluster v status*
> >
> > *Status of volume: test*
> >
> > *Gluster process                             TCP Port  RDMA Port  Online
> > Pid*
> >
> >
> > *-------------------------------------------------------------------
> > -----------*
> >
> > *Brick ubuntu:/home/gfs/b1                   49152     0          Y
> > 7798*
> >
> > *Brick ubuntu:/home/gfs/b2                   49153     0          Y
> > 7818*
> >
> > *Self-heal Daemon on localhost               N/A       N/A        Y
> > 7839*
> >
> >
> >
> > *Task Status of Volume test*
> >
> >
> > *-------------------------------------------------------------------
> > -----------*
> >
> > *There are no active volume tasks*
> >
> >
> >
> >
> >
> > *root@ubuntu:~# gluster v set test cluster.consistent-metadata on*
> >
> > *volume set: success*
> >
> >
> >
> > *root@ubuntu:~# ls /mnt/test*
> >
> > *ls: cannot access '/mnt/test': No such file or directory*
> >
> > *root@ubuntu:~# mkdir -p /mnt/test*
> >
> > *root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test*
> >
> >
> >
> > *root@ubuntu:~# cd /mnt/test*
> >
> > *root@ubuntu:/mnt/test# echo "abc">aaa*
> >
> > *root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc*
> >
> >
> >
> > *root@ubuntu:/mnt/test# kill -9 7818*
> >
> > *root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee*
> >
> > *link: cannot create link 'eee' to 'ddd': No such file or directory*
> >
> >
> >
> >
> >
> > Best Regards,
> >
> > George
> >
> >
> >
> > *From:* gluster-devel-bounces@gluster.org
> > [mailto:gluster-devel-bounces@ gluster.org] *On Behalf Of *Pranith
> > Kumar Karampuri
> > *Sent:* Thursday, January 18, 2018 2:40 PM
> >
> >
> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)
> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li,
> > Deqian (NSB - CN/Hangzhou) < deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping
> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>
> > *Subject:* Re: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) <
> > george.lian@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > I suppose the brick numbers in your testing is six, and you just
> > shut down the 3 process.
> >
> > When I reproduce the issue, I only create a replicate volume with 2
> > bricks, only let ONE brick working and set
> > cluster.consistent-metadata on,
> >
> > With this 2 test condition, the issue could 100% reproducible.
> >
> >
> >
> > Hi,
> >
> >       I actually tried it with replica-2 and replica-3 and then
> > distributed replica-2 before replying to the earlier mail. We can
> > have a debugging session if you are okay with it.
> >
> > I am in the middle of a customer issue myself(That is the reason for
> > this delay :-( ) and thinking of wrapping it up early next week.
> > Would that be fine with you?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 16:44:28 :) ⚡ gluster v status
> >
> > Status of volume: r2
> >
> > Gluster process                             TCP Port  RDMA Port  Online
> > Pid
> >
> > ------------------------------------------------------------
> > ------------------
> >
> > Brick localhost.localdomain:/home/gfs/r2_0  49152     0          Y
> > 5309
> >
> > Brick localhost.localdomain:/home/gfs/r2_1  49154     0          Y
> > 5330
> >
> > Brick localhost.localdomain:/home/gfs/r2_2  49156     0          Y
> > 5351
> >
> > Brick localhost.localdomain:/home/gfs/r2_3  49158     0          Y
> > 5372
> >
> > Brick localhost.localdomain:/home/gfs/r2_4  49159     0          Y
> > 5393
> >
> > Brick localhost.localdomain:/home/gfs/r2_5  49160     0          Y
> > 5414
> >
> > Self-heal Daemon on localhost               N/A       N/A        Y
> > 5436
> >
> >
> >
> > Task Status of Volume r2
> >
> > ------------------------------------------------------------
> > ------------------
> >
> > There are no active volume tasks
> >
> >
> >
> > root@dhcp35-190 - ~
> >
> > 16:44:38 :) ⚡ kill -9 5309 5351 5393
> >
> >
> >
> > Best Regards,
> >
> > George
> >
> > *From:* gluster-devel-bounces@gluster.org
> > [mailto:gluster-devel-bounces@ gluster.org] *On Behalf Of *Pranith
> > Kumar Karampuri
> > *Sent:* Wednesday, January 17, 2018 7:27 PM
> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
> > *Cc:* Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>;
> > Gluster-devel@xxxxxxxxxxx; Zhou, Cynthia (NSB - CN/Hangzhou) <
> > cynthia.zhou@xxxxxxxxxxxxxxx>; Sun, Ping (NSB - CN/Hangzhou) <
> > ping.sun@xxxxxxxxxxxxxxx>
> >
> >
> > *Subject:* Re: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri <
> > pkarampu@xxxxxxxxxx> wrote:
> >
> >
> >
> >
> >
> > On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) <
> > george.lian@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> >
> >
> > Have you reproduced this issue? If yes, could you please confirm
> > whether it is an issue or not?
> >
> >
> >
> > Hi,
> >
> >        I tried recreating this on my laptop and on both master and
> > 3.12 and I am not able to recreate the issue :-(.
> >
> > Here is the execution log: https://paste.fedoraproject.org/paste/-
> > csXUKrwsbrZAVW1KzggQQ
> >
> > Since I was doing this on my laptop, I changed shutting down of the
> > replica to killing the brick process to simulate this test.
> >
> > Let me know if I missed something.
> >
> >
> >
> >
> >
> > Sorry, I am held up with some issue at work, so I think I will get
> > some time day after tomorrow to look at this. In the mean time I am
> > adding more people who know about afr to see if they get a chance to
> > work on this before me.
> >
> >
> >
> >
> >
> > And if it is an issue,  do you have any solution for this issue?
> >
> >
> >
> > Thanks & Best Regards,
> >
> > George
> >
> >
> >
> > *From:* Lian, George (NSB - CN/Hangzhou)
> > *Sent:* Thursday, January 11, 2018 2:01 PM
> > *To:* Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)
> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li,
> > Deqian (NSB - CN/Hangzhou) < deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping
> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>
> > *Subject:* RE: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> > Hi,
> >
> >
> >
> > Please see detail test step on https://bugzilla.redhat.com/
> > show_bug.cgi?id=1531457
> >
> >
> >
> > How reproducible:
> >
> >
> >
> >
> >
> > Steps to Reproduce:
> >
> > 1.create a volume name "test" with replicated
> >
> > 2.set volume option cluster.consistent-metadata with on:
> >
> >   gluster v set test cluster.consistent-metadata on
> >
> > 3. mount volume test on client on /mnt/test
> >
> > 4. create a file aaa size more than 1 byte
> >
> >    echo "1234567890" >/mnt/test/aaa
> >
> > 5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
> >
> > 6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc
> >
> >
> >
> >
> >
> > BRs
> >
> > George
> >
> >
> >
> > *From:* gluster-devel-bounces@gluster.org
> > [mailto:gluster-devel-bounces@ gluster.org
> > <gluster-devel-bounces@gluster.org>] *On Behalf Of *Pranith Kumar
> > Karampuri
> > *Sent:* Thursday, January 11, 2018 12:39 PM
> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)
> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li,
> > Deqian (NSB - CN/Hangzhou) < deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping
> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>
> > *Subject:* Re: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) <
> > george.lian@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > >>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
> >
> > It is fuse, within mountpoint by “mount -t glusterfs  …“ command.
> >
> >
> >
> > Could you let me know the test you did so that I can try to
> > re-create and see what exactly is going on?
> >
> > Configuration of the volume and the steps to re-create the issue you
> > are seeing would be helpful in debugging the issue further.
> >
> >
> >
> >
> >
> > Thanks & Best Regards,
> >
> > George
> >
> >
> >
> > *From:* gluster-devel-bounces@gluster.org
> > [mailto:gluster-devel-bounces@ gluster.org] *On Behalf Of *Pranith
> > Kumar Karampuri
> > *Sent:* Wednesday, January 10, 2018 8:08 PM
> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)
> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Zhong, Hua (NSB - CN/Hangzhou)
> > <hua.zhong@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou)
> > <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping
> > (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
> > *Subject:* Re: a link issue maybe introduced in a
> > bug fix " Don't let NFS cache stat after writes"
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) <
> > george.lian@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi, Pranith Kumar,
> >
> >
> >
> > I has create a bug on Bugzilla https://bugzilla.redhat.com/
> > show_bug.cgi?id=1531457
> >
> > After my investigation for this link issue, I suppose your changes
> > on afr-dir-write.c with issue " Don't let NFS cache stat after
> > writes" , your fix is like:
> >
> > --------------------------------------
> >
> >        if (afr_txn_nothing_failed (frame, this)) {
> >
> >                         /*if it did pre-op, it will do post-op
> > changing ctime*/
> >
> >                         if (priv->consistent_metadata &&
> >
> >                             afr_needs_changelog_update (local))
> >
> >                                 afr_zero_fill_stat (local);
> >
> >                         local->transaction.unwind (frame, this);
> >
> >                 }
> >
> > In the above fix, it set the ia_nlink to ‘0’ if option
> > consistent-metadata is set to “on”.
> >
> > And hard link a file with which just created will lead to an error,
> > and the error is caused in kernel function “vfs_link”:
> >
> > if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
> >
> >              error =  -ENOENT;
> >
> >
> >
> > could you please have a check and give some comments here?
> >
> >
> >
> > When stat is "zero filled", understanding is that the higher layer
> > protocol doesn't send stat value to the kernel and a separate lookup
> > is sent by the kernel to get the latest stat value. In which
> > protocol are you seeing this issue? Fuse/NFS/SMB?
> >
> >
> >
> >
> >
> > Thanks & Best Regards,
> >
> > George
> >
> >
> >
> >
> > --
> >
> > Pranith
> >
> >
> >
> >
> > --
> >
> > Pranith
> >
> >
> >
> >
> > --
> >
> > Pranith
> >
> >
> >
> >
> > --
> >
> > Pranith
> >
> >
> >
> >
> > --
> >
> > Pranith
> >
> >
> >
> >
> > --
> >
> > Pranith
> >
>
>
>
> --
> Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux