Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 25 Jan 2018 08:49:32 +0530

On 25 Jan 2018 8:43 am, "Lian, George (NSB - CN/Hangzhou)" <george.lian@xxxxxxxxxxxxxxx> wrote:
Hi,

I suppose the zero filled attr is for performance consider to NFS, but for fuse, it will lead issue such like hard LINK FOP,

So I suggest could we add 2 attr field in the endof "struct iatt {", such like ia_fuse_nlink, ia_fuse_ctime,

And in function gf_zero_fill_stat , saved the ia_nlink, ia_ctime to ia_fuse_nlink,ia_fuse_ctime before set its to zero,

And restore it to valued nlink and ctime in function gf_fuse_stat2attr,

So that kernel could get the correct nlink and ctime.

Is it a considerable solution? Any risk?

Please share your comments, thanks in advance!

Adding csaba for helping with this.

Best Regards,

George

-----Original Message-----

From: gluster-devel-bounces@gluster.org [mailto:gluster-devel-bounces@gluster.org] On Behalf Of Niels de Vos

Sent: Wednesday, January 24, 2018 7:43 PM

To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>

Cc: Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>; Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>

Subject: Re:  a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

On Wed, Jan 24, 2018 at 02:24:06PM +0530, Pranith Kumar Karampuri wrote:

> hi,

>        In the same commit you mentioned earlier, there was this code

> earlier:

> -/* Returns 1 if the stat seems to be filled with zeroes. */ -int

> -nfs_zero_filled_stat (struct iatt *buf) -{

> -        if (!buf)

> -                return 1;

> -

> -        /* Do not use st_dev because it is transformed to store the xlator

> id

> -         * in place of the device number. Do not use st_ino because by

> this time

> -         * we've already mapped the root ino to 1 so it is not guaranteed

> to be

> -         * 0.

> -         */

> -        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))

> -                return 1;

> -

> -        return 0;

> -}

> -

> -

>

> I moved this to a common library function that can be used in afr as well.

> Why was it there in NFS? +Niels for answering that question.

Sorry, I dont know why that was done. It was introduced with the initial gNFS implementation, long before I started to work with Gluster. The only reference I have is this from

xlators/nfs/server/src/nfs3-helpers.c:nfs3_stat_to_post_op_attr()

 371         /* Some performance translators return zero-filled stats when they

 372          * do not have up-to-date attributes. Need to handle this by not

 373          * returning these zeroed out attrs.

 374          */

This may not be true for the current situation anymore.

HTH,

Niels

>

> If I give you a patch which will assert the error condition, would it

> be possible for you to figure out the first xlator which is unwinding

> the iatt with nlink count as zero but ctime as non-zero?

>

> On Wed, Jan 24, 2018 at 1:03 PM, Lian, George (NSB - CN/Hangzhou) <

> george.lian@xxxxxxxxxxxxxxx> wrote:

>

> > Hi,  Pranith Kumar,

> >

> >

> >

> > Can you tell me while need set buf->ia_nlink to “0”in function

> > gf_zero_fill_stat(), which API or Application will care it?

> >

> > If I remove this line and also update corresponding in function

> > gf_is_zero_filled_stat,

> >

> > The issue seems gone, but I can’t confirm will lead to other issues.

> >

> >

> >

> > So could you please double check it and give your comments?

> >

> >

> >

> > My change is as the below:

> >

> >

> >

> > gf_boolean_t

> >

> > gf_is_zero_filled_stat (struct iatt *buf)

> >

> > {

> >

> >         if (!buf)

> >

> >                 return 1;

> >

> >

> >

> >         /* Do not use st_dev because it is transformed to store the

> > xlator id

> >

> >          * in place of the device number. Do not use st_ino because

> > by this time

> >

> >          * we've already mapped the root ino to 1 so it is not

> > guaranteed to be

> >

> >          * 0.

> >

> >          */

> >

> > //        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))

> >

> >         if (buf->ia_ctime == )

> >

> >                 return 1;

> >

> >

> >

> >         return 0;

> >

> > }

> >

> >

> >

> > void

> >

> > gf_zero_fill_stat (struct iatt *buf)

> >

> > {

> >

> > //       buf->ia_nlink = 0;

> >

> >         buf->ia_ctime = 0;

> >

> > }

> >

> >

> >

> > Thanks & Best Regards

> >

> > George

> >

> > *From:* Lian, George (NSB - CN/Hangzhou)

> > *Sent:* Friday, January 19, 2018 10:03 AM

> > *To:* Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>; Zhou, Cynthia

> > (NSB -

> > CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>

> > *Cc:* Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>;

> > Gluster-devel@xxxxxxxxxxx; Sun, Ping (NSB - CN/Hangzhou) <

> > ping.sun@xxxxxxxxxxxxxxx>

> >

> > *Subject:* RE:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> > Hi,

> >

> > >>> Cool, this works for me too. Send me a mail off-list once you

> > >>> are

> > available and we can figure out a way to get into a call and work on this.

> >

> >

> >

> > Have you reproduced the issue per the step I listed in

> > https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail?

> >

> >

> >

> > If not, I would like you could try it yourself , which the

> > difference between yours and mine is just create only 2 bricks instead of 6 bricks.

> >

> >

> >

> > And Cynthia could have a session with you if you needed when I am

> > not available in next Monday and Tuesday.

> >

> >

> >

> > Thanks & Best Regards,

> >

> > George

> >

> >

> >

> > *From:* gluster-devel-bounces@gluster.org

> > [mailto:gluster-devel-bounces@ gluster.org

> > <gluster-devel-bounces@gluster.org>] *On Behalf Of *Pranith Kumar

> > Karampuri

> > *Sent:* Thursday, January 18, 2018 6:03 PM

> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>

> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)

> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou)

> > <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping

> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>

> > *Subject:* Re:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> >

> >

> >

> >

> > On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) <

> > george.lian@xxxxxxxxxxxxxxx> wrote:

> >

> > Hi,

> >

> > >>>I actually tried it with replica-2 and replica-3 and then

> > >>>distributed

> > replica-2 before replying to the earlier mail. We can have a

> > debugging session if you are okay with it.

> >

> >

> >

> > It is fine if you can’t reproduce the issue in your ENV.

> >

> > And I has attached the detail reproduce log in the Bugzilla FYI

> >

> >

> >

> > But I am sorry I maybe OOO at Monday and Tuesday next week, so debug

> > session will be fine to me at next Wednesday.

> >

> >

> >

> > Cool, this works for me too. Send me a mail off-list once you are

> > available and we can figure out a way to get into a call and work on this.

> >

> >

> >

> >

> >

> >

> >

> > Paste the detail reproduce log FYI here:

> >

> > *root@ubuntu:~# gluster peer probe ubuntu*

> >

> > *peer probe: success. Probe on localhost not needed*

> >

> > *root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1

> > ubuntu:/home/gfs/b2 force*

> >

> > *volume create: test: success: please start the volume to access

> > data*

> >

> > *root@ubuntu:~# gluster v start test*

> >

> > *volume start: test: success*

> >

> > *root@ubuntu:~# gluster v info test*

> >

> >

> >

> > *Volume Name: test*

> >

> > *Type: Replicate*

> >

> > *Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5*

> >

> > *Status: Started*

> >

> > *Snapshot Count: 0*

> >

> > *Number of Bricks: 1 x 2 = 2*

> >

> > *Transport-type: tcp*

> >

> > *Bricks:*

> >

> > *Brick1: ubuntu:/home/gfs/b1*

> >

> > *Brick2: ubuntu:/home/gfs/b2*

> >

> > *Options Reconfigured:*

> >

> > *transport.address-family: inet*

> >

> > *nfs.disable: on*

> >

> > *performance.client-io-threads: off*

> >

> > *root@ubuntu:~# gluster v status*

> >

> > *Status of volume: test*

> >

> > *Gluster process                             TCP Port  RDMA Port  Online

> > Pid*

> >

> >

> > *-------------------------------------------------------------------

> > -----------*

> >

> > *Brick ubuntu:/home/gfs/b1                   49152     0          Y

> > 7798*

> >

> > *Brick ubuntu:/home/gfs/b2                   49153     0          Y

> > 7818*

> >

> > *Self-heal Daemon on localhost               N/A       N/A        Y

> > 7839*

> >

> >

> >

> > *Task Status of Volume test*

> >

> >

> > *-------------------------------------------------------------------

> > -----------*

> >

> > *There are no active volume tasks*

> >

> >

> >

> >

> >

> > *root@ubuntu:~# gluster v set test cluster.consistent-metadata on*

> >

> > *volume set: success*

> >

> >

> >

> > *root@ubuntu:~# ls /mnt/test*

> >

> > *ls: cannot access '/mnt/test': No such file or directory*

> >

> > *root@ubuntu:~# mkdir -p /mnt/test*

> >

> > *root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test*

> >

> >

> >

> > *root@ubuntu:~# cd /mnt/test*

> >

> > *root@ubuntu:/mnt/test# echo "abc">aaa*

> >

> > *root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc*

> >

> >

> >

> > *root@ubuntu:/mnt/test# kill -9 7818*

> >

> > *root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee*

> >

> > *link: cannot create link 'eee' to 'ddd': No such file or directory*

> >

> >

> >

> >

> >

> > Best Regards,

> >

> > George

> >

> >

> >

> > *From:* gluster-devel-bounces@gluster.org

> > [mailto:gluster-devel-bounces@ gluster.org] *On Behalf Of *Pranith

> > Kumar Karampuri

> > *Sent:* Thursday, January 18, 2018 2:40 PM

> >

> >

> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>

> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)

> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li,

> > Deqian (NSB - CN/Hangzhou) < deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping

> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>

> > *Subject:* Re:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> >

> >

> >

> >

> > On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) <

> > george.lian@xxxxxxxxxxxxxxx> wrote:

> >

> > Hi,

> >

> > I suppose the brick numbers in your testing is six, and you just

> > shut down the 3 process.

> >

> > When I reproduce the issue, I only create a replicate volume with 2

> > bricks, only let ONE brick working and set

> > cluster.consistent-metadata on,

> >

> > With this 2 test condition, the issue could 100% reproducible.

> >

> >

> >

> > Hi,

> >

> >       I actually tried it with replica-2 and replica-3 and then

> > distributed replica-2 before replying to the earlier mail. We can

> > have a debugging session if you are okay with it.

> >

> > I am in the middle of a customer issue myself(That is the reason for

> > this delay :-( ) and thinking of wrapping it up early next week.

> > Would that be fine with you?

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > 16:44:28 :) ⚡ gluster v status

> >

> > Status of volume: r2

> >

> > Gluster process                             TCP Port  RDMA Port  Online

> > Pid

> >

> > ------------------------------------------------------------

> > ------------------

> >

> > Brick localhost.localdomain:/home/gfs/r2_0  49152     0          Y

> > 5309

> >

> > Brick localhost.localdomain:/home/gfs/r2_1  49154     0          Y

> > 5330

> >

> > Brick localhost.localdomain:/home/gfs/r2_2  49156     0          Y

> > 5351

> >

> > Brick localhost.localdomain:/home/gfs/r2_3  49158     0          Y

> > 5372

> >

> > Brick localhost.localdomain:/home/gfs/r2_4  49159     0          Y

> > 5393

> >

> > Brick localhost.localdomain:/home/gfs/r2_5  49160     0          Y

> > 5414

> >

> > Self-heal Daemon on localhost               N/A       N/A        Y

> > 5436

> >

> >

> >

> > Task Status of Volume r2

> >

> > ------------------------------------------------------------

> > ------------------

> >

> > There are no active volume tasks

> >

> >

> >

> > root@dhcp35-190 - ~

> >

> > 16:44:38 :) ⚡ kill -9 5309 5351 5393

> >

> >

> >

> > Best Regards,

> >

> > George

> >

> > *From:* gluster-devel-bounces@gluster.org

> > [mailto:gluster-devel-bounces@ gluster.org] *On Behalf Of *Pranith

> > Kumar Karampuri

> > *Sent:* Wednesday, January 17, 2018 7:27 PM

> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>

> > *Cc:* Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>;

> > Gluster-devel@xxxxxxxxxxx; Zhou, Cynthia (NSB - CN/Hangzhou) <

> > cynthia.zhou@xxxxxxxxxxxxxxx>; Sun, Ping (NSB - CN/Hangzhou) <

> > ping.sun@xxxxxxxxxxxxxxx>

> >

> >

> > *Subject:* Re:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> >

> >

> >

> >

> > On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri <

> > pkarampu@xxxxxxxxxx> wrote:

> >

> >

> >

> >

> >

> > On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) <

> > george.lian@xxxxxxxxxxxxxxx> wrote:

> >

> > Hi,

> >

> >

> >

> > Have you reproduced this issue? If yes, could you please confirm

> > whether it is an issue or not?

> >

> >

> >

> > Hi,

> >

> >        I tried recreating this on my laptop and on both master and

> > 3.12 and I am not able to recreate the issue :-(.

> >

> > Here is the execution log: https://paste.fedoraproject.org/paste/-

> > csXUKrwsbrZAVW1KzggQQ

> >

> > Since I was doing this on my laptop, I changed shutting down of the

> > replica to killing the brick process to simulate this test.

> >

> > Let me know if I missed something.

> >

> >

> >

> >

> >

> > Sorry, I am held up with some issue at work, so I think I will get

> > some time day after tomorrow to look at this. In the mean time I am

> > adding more people who know about afr to see if they get a chance to

> > work on this before me.

> >

> >

> >

> >

> >

> > And if it is an issue,  do you have any solution for this issue?

> >

> >

> >

> > Thanks & Best Regards,

> >

> > George

> >

> >

> >

> > *From:* Lian, George (NSB - CN/Hangzhou)

> > *Sent:* Thursday, January 11, 2018 2:01 PM

> > *To:* Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>

> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)

> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li,

> > Deqian (NSB - CN/Hangzhou) < deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping

> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>

> > *Subject:* RE:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> > Hi,

> >

> >

> >

> > Please see detail test step on https://bugzilla.redhat.com/

> > show_bug.cgi?id=1531457

> >

> >

> >

> > How reproducible:

> >

> >

> >

> >

> >

> > Steps to Reproduce:

> >

> > 1.create a volume name "test" with replicated

> >

> > 2.set volume option cluster.consistent-metadata with on:

> >

> >   gluster v set test cluster.consistent-metadata on

> >

> > 3. mount volume test on client on /mnt/test

> >

> > 4. create a file aaa size more than 1 byte

> >

> >    echo "1234567890" >/mnt/test/aaa

> >

> > 5. shutdown a replicat node, let's say sn-1, only let sn-0 worked

> >

> > 6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc

> >

> >

> >

> >

> >

> > BRs

> >

> > George

> >

> >

> >

> > *From:* gluster-devel-bounces@gluster.org

> > [mailto:gluster-devel-bounces@ gluster.org

> > <gluster-devel-bounces@gluster.org>] *On Behalf Of *Pranith Kumar

> > Karampuri

> > *Sent:* Thursday, January 11, 2018 12:39 PM

> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>

> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)

> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li,

> > Deqian (NSB - CN/Hangzhou) < deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping

> > (NSB - CN/Hangzhou) < ping.sun@xxxxxxxxxxxxxxx>

> > *Subject:* Re:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> >

> >

> >

> >

> > On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) <

> > george.lian@xxxxxxxxxxxxxxx> wrote:

> >

> > Hi,

> >

> > >>> In which protocol are you seeing this issue? Fuse/NFS/SMB?

> >

> > It is fuse, within mountpoint by “mount -t glusterfs  …“ command.

> >

> >

> >

> > Could you let me know the test you did so that I can try to

> > re-create and see what exactly is going on?

> >

> > Configuration of the volume and the steps to re-create the issue you

> > are seeing would be helpful in debugging the issue further.

> >

> >

> >

> >

> >

> > Thanks & Best Regards,

> >

> > George

> >

> >

> >

> > *From:* gluster-devel-bounces@gluster.org

> > [mailto:gluster-devel-bounces@ gluster.org] *On Behalf Of *Pranith

> > Kumar Karampuri

> > *Sent:* Wednesday, January 10, 2018 8:08 PM

> > *To:* Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>

> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou)

> > <cynthia.zhou@xxxxxxxxxxxxxxx>; Zhong, Hua (NSB - CN/Hangzhou)

> > <hua.zhong@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou)

> > <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping

> > (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>

> > *Subject:* Re:  a link issue maybe introduced in a

> > bug fix " Don't let NFS cache stat after writes"

> >

> >

> >

> >

> >

> >

> >

> > On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) <

> > george.lian@xxxxxxxxxxxxxxx> wrote:

> >

> > Hi, Pranith Kumar,

> >

> >

> >

> > I has create a bug on Bugzilla https://bugzilla.redhat.com/

> > show_bug.cgi?id=1531457

> >

> > After my investigation for this link issue, I suppose your changes

> > on afr-dir-write.c with issue " Don't let NFS cache stat after

> > writes" , your fix is like:

> >

> > --------------------------------------

> >

> >        if (afr_txn_nothing_failed (frame, this)) {

> >

> >                         /*if it did pre-op, it will do post-op

> > changing ctime*/

> >

> >                         if (priv->consistent_metadata &&

> >

> >                             afr_needs_changelog_update (local))

> >

> >                                 afr_zero_fill_stat (local);

> >

> >                         local->transaction.unwind (frame, this);

> >

> >                 }

> >

> > In the above fix, it set the ia_nlink to ‘0’ if option

> > consistent-metadata is set to “on”.

> >

> > And hard link a file with which just created will lead to an error,

> > and the error is caused in kernel function “vfs_link”:

> >

> > if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))

> >

> >              error =  -ENOENT;

> >

> >

> >

> > could you please have a check and give some comments here?

> >

> >

> >

> > When stat is "zero filled", understanding is that the higher layer

> > protocol doesn't send stat value to the kernel and a separate lookup

> > is sent by the kernel to get the latest stat value. In which

> > protocol are you seeing this issue? Fuse/NFS/SMB?

> >

> >

> >

> >

> >

> > Thanks & Best Regards,

> >

> > George

> >

> >

> >

> >

> > --

> >

> > Pranith

> >

> >

> >

> >

> > --

> >

> > Pranith

> >

> >

> >

> >

> > --

> >

> > Pranith

> >

> >

> >

> >

> > --

> >

> > Pranith

> >

> >

> >

> >

> > --

> >

> > Pranith

> >

> >

> >

> >

> > --

> >

> > Pranith

> >

>

>

>

> --

> Pranith

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel