Hi,
>>>I actually tried it with replica-2 and replica-3 and then distributed replica-2 before replying to the earlier mail. We can have a debugging session if you are okay with it.
It is fine if you can’t reproduce the issue in your ENV.
And I has attached the detail reproduce log in the Bugzilla FYI
But I am sorry I maybe OOO at Monday and Tuesday next week, so debug session will be fine to me at next Wednesday.
Paste the detail reproduce log FYI here:
root@ubuntu:~# gluster peer probe ubuntu
peer probe: success. Probe on localhost not needed
root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 ubuntu:/home/gfs/b2 force
volume create: test: success: please start the volume to access data
root@ubuntu:~# gluster v start test
volume start: test: success
root@ubuntu:~# gluster v info test
Volume Name: test
Type: Replicate
Volume ID: fef5fca3-81d9-46d3-8847-
74cde6f701a5 Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ubuntu:/home/gfs/b1
Brick2: ubuntu:/home/gfs/b2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
root@ubuntu:~# gluster v status
Status of volume: test
Gluster process TCP Port RDMA Port Online Pid
------------------------------
------------------------------ ------------------ Brick ubuntu:/home/gfs/b1
49152 0 Y 7798 Brick ubuntu:/home/gfs/b2
49153 0 Y 7818 Self-heal Daemon on localhost N/A N/A Y 7839
Task Status of Volume test
------------------------------
------------------------------ ------------------ There are no active volume tasks
root@ubuntu:~# gluster v set test cluster.consistent-metadata on
volume set: success
root@ubuntu:~# ls /mnt/test
ls: cannot access '/mnt/test': No such file or directory
root@ubuntu:~# mkdir -p /mnt/test
root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test
root@ubuntu:~# cd /mnt/test
root@ubuntu:/mnt/test# echo "abc">aaa
root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc
root@ubuntu:/mnt/test# kill -9 7818
root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee
link: cannot create link 'eee' to 'ddd': No such file or directory
Best Regards,
George
From: gluster-devel-bounces@gluster.
org [mailto:gluster-devel-bounces@gluster.org ] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 2:40 PM
To: Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
Subject: Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"
On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx> wrote:
Hi,
I suppose the brick numbers in your testing is six, and you just shut down the 3 process.
When I reproduce the issue, I only create a replicate volume with 2 bricks, only let ONE brick working and set cluster.consistent-metadata on,
With this 2 test condition, the issue could 100% reproducible.
Hi,
I actually tried it with replica-2 and replica-3 and then distributed replica-2 before replying to the earlier mail. We can have a debugging session if you are okay with it.
I am in the middle of a customer issue myself(That is the reason for this delay :-( ) and thinking of wrapping it up early next week. Would that be fine with you?
16:44:28 :) ⚡ gluster v status
Status of volume: r2
Gluster process
TCP Port RDMA Port Online Pid ------------------------------
------------------------------ ------------------ Brick localhost.localdomain:/home/
gfs/r2_0 49152 0 Y 5309Brick localhost.localdomain:/home/
gfs/r2_1 49154 0 Y 5330Brick localhost.localdomain:/home/
gfs/r2_2 49156 0 Y 5351Brick localhost.localdomain:/home/
gfs/r2_3 49158 0 Y 5372Brick localhost.localdomain:/home/
gfs/r2_4 49159 0 Y 5393Brick localhost.localdomain:/home/
gfs/r2_5 49160 0 Y 5414Self-heal Daemon on localhost N/A N/A Y 5436
Task Status of Volume r2
------------------------------
------------------------------ ------------------ There are no active volume tasks
root@dhcp35-190 - ~
16:44:38 :) ⚡ kill -9 5309 5351 5393
Best Regards,
George
From: gluster-devel-bounces@gluster.
org [mailto:gluster-devel-bounces@gluster.org ] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 17, 2018 7:27 PM
To: Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
Cc: Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
Subject: Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"
On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:
On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx> wrote:
Hi,
Have you reproduced this issue? If yes, could you please confirm whether it is an issue or not?
Hi,
I tried recreating this on my laptop and on both master and 3.12 and I am not able to recreate the issue :-(.
Here is the execution log: https://paste.fedoraproject.
org/paste/- csXUKrwsbrZAVW1KzggQQ Since I was doing this on my laptop, I changed shutting down of the replica to killing the brick process to simulate this test.
Let me know if I missed something.
Sorry, I am held up with some issue at work, so I think I will get some time day after tomorrow to look at this. In the mean time I am adding more people who know about afr to see if they get a chance to work on this before me.
And if it is an issue, do you have any solution for this issue?
Thanks & Best Regards,
George
From: Lian, George (NSB - CN/Hangzhou)
Sent: Thursday, January 11, 2018 2:01 PM
To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
Subject: RE: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"
Hi,
Please see detail test step on https://bugzilla.redhat.com/
show_bug.cgi?id=1531457
How reproducible:
Steps to Reproduce:
1.create a volume name "test" with replicated
2.set volume option cluster.consistent-metadata with on:
gluster v set test cluster.consistent-metadata on
3. mount volume test on client on /mnt/test
4. create a file aaa size more than 1 byte
echo "1234567890" >/mnt/test/aaa
5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc
BRs
George
From: gluster-devel-bounces@gluster.
org [mailto:gluster-devel-bounces@gluster.org ] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 11, 2018 12:39 PM
To: Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
Subject: Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"
On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx> wrote:
Hi,
>>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
It is fuse, within mountpoint by “mount -t glusterfs …“ command.
Could you let me know the test you did so that I can try to re-create and see what exactly is going on?
Configuration of the volume and the steps to re-create the issue you are seeing would be helpful in debugging the issue further.
Thanks & Best Regards,
George
From: gluster-devel-bounces@gluster.
org [mailto:gluster-devel-bounces@gluster.org ] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 10, 2018 8:08 PM
To: Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>; Zhong, Hua (NSB - CN/Hangzhou) <hua.zhong@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Sun, Ping (NSB - CN/Hangzhou) <ping.sun@xxxxxxxxxxxxxxx>
Subject: Re: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"
On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) <george.lian@xxxxxxxxxxxxxxx> wrote:
Hi, Pranith Kumar,
I has create a bug on Bugzilla https://bugzilla.redhat.com/
show_bug.cgi?id=1531457 After my investigation for this link issue, I suppose your changes on afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix is like:
------------------------------
-------- if (afr_txn_nothing_failed (frame, this)) {
/*if it did pre-op, it will do post-op changing ctime*/
if (priv->consistent_metadata &&
afr_needs_changelog_update (local))
afr_
zero_fill_stat (local); local->transaction.unwind (frame, this);
}
In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is set to “on”.
And hard link a file with which just created will lead to an error, and the error is caused in kernel function “vfs_link”:
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
error = -ENOENT;
could you please have a check and give some comments here?
When stat is "zero filled", understanding is that the higher layer protocol doesn't send stat value to the kernel and a separate lookup is sent by the kernel to get the latest stat value. In which protocol are you seeing this issue? Fuse/NFS/SMB?
Thanks & Best Regards,
George
--Pranith
--Pranith
--Pranith
--Pranith
--Pranith
--
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel