Re: Sometimes read failed just after write before file closed when try 100 times in different new files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

yes, in test code there was such issue. The production code that occasionally has issues with first time accessing the file however has proper parameters in use when opening a file.

 

In production the issue is now gone when the same file being read/written is no longer locked with another file descriptor usin gflock/fcntl but the access prevention is done with a separate lock file, so avoiding dual handle to same file seems to fix the problem. Perhaps I'll need to tune the production code still a bit so that further information could be obtained why the file does not contain valid data right after writing.

 

br, sami

 

 

From: Lian, George (NSB - CN/Hangzhou)
Sent: Thursday, December 28, 2017 8:21 AM
To: Gluster-devel@xxxxxxxxxxx; Venetjoki, Sami (Nokia - FI/Espoo) <sami.venetjoki@xxxxxxxxx>
Cc: Zhong, Hua (NSB - CN/Hangzhou) <hua.zhong@xxxxxxxxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>; I_MBB_HETRAN_SW3_TEAM_MATRIX_GMS <I_MBB_HETRAN_SW3_TEAM_MATRIX@xxxxxxxxxxxxxxxx>
Subject: RE: Sometimes read failed just after write before file closed when try 100 times in different new files

 

Hi, Gluster Expert,

 

Sorry for wrong code open(argv[1], O_CREAT|O_RDWR|S_IRUSR|S_IWUSR) in previous mail, after correct it , the issues seems GONE.

But I still confuse why wrong code not failed in EXT4 file system.

After I check tcpdump packet via network, it show that in CREATE process, the mode of file sometime became to Mode: 0101000, S_ISVTX, Reserved , with this mode, the file could be seen with ls command, but cant seen by hexdumpcommand even with root privilege,  but if run this wrong program in ext4 file system, this MODE cant be occurred.

So I suggest maybe Gluster could do some inprovment for this MODE , so that can avoid thiss cant access file with wrong API called.

 

@Sami,

After I check the manual of open API, the correct format should be

      int open(const char *pathname, int flags);

       int open(const char *pathname, int flags, mode_t mode);

for your test code, I suppose the correct call should be open(argv[1], O_CREAT|O_RDWR,S_IRUSR|S_IWUSR)

after I correct it , c and C++ code all PASS in 3.6 and 3.12.

 

Best Regards,

George

 

From: Lian, George (NSB - CN/Hangzhou)
Sent: Wednesday, December 27, 2017 5:23 PM
To: Gluster-devel@xxxxxxxxxxx
Cc: Zhong, Hua (NSB - CN/Hangzhou) <hua.zhong@xxxxxxxxxxxxxxx>; Venetjoki, Sami (Nokia - FI/Espoo) <sami.venetjoki@xxxxxxxxx>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li@xxxxxxxxxxxxxxx>
Subject: Sometimes read failed just after write before file closed when try 100 times in different new files

 

Hi, Gluster expert,

 

I has just raise a bug with https://bugzilla.redhat.com/show_bug.cgi?id=1529237 on gluster community.

 

I also attached the reproduce source code and shell script here FYI.

 

From my investigation , I suppose the issue maybe caused by cache in client is invalid but the file still not been persistented in server brick

But I cant find the root cause from the debug log which attached in https://bugzilla.redhat.com/show_bug.cgi?id=1529237.

 

From the following log I abstract client log from the attachment in the Bugzilla, I highlight it with YELLOW which in read phase, you can see both volume(test-client-0 and test-client-1) are failed

I just suppose it maybe some abnormal when it in read phase, but I cant any usefully log in server brick log from myself, so your helps is highly appreciated!

 

[2017-12-27 06:21:04.332524] T [MSGID: 0] [afr-inode-read.c:286:afr_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5ec010680, test-replicate-0 returned 0

[2017-12-27 06:21:04.332724] T [MSGID: 0] [syncop.c:1715:syncop_fgetxattr] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-dht to test-replicate-0

[2017-12-27 06:21:04.332768] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 12-test-replicate-0: 70e0f7ee-2f0a-4364-a38e-96e7031eee7b: generation now vs cached: 2, 2

[2017-12-27 06:21:04.332791] T [MSGID: 0] [afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0

[2017-12-27 06:21:04.332817] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000

[2017-12-27 06:21:04.332837] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76

[2017-12-27 06:21:04.332862] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) to rpc-transport (test-client-0)

[2017-12-27 06:21:04.333943] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-0: received rpc message (RPC XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) from rpc-transport (test-client-0)

[2017-12-27 06:21:04.334013] D [MSGID: 0] [client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-0: remote operation failed: No data available

[2017-12-27 06:21:04.334036] D [MSGID: 0] [client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-0 returned -1 error: No data available [No data available]

[2017-12-27 06:21:04.334114] T [MSGID: 0] [afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0

[2017-12-27 06:21:04.334174] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000

[2017-12-27 06:21:04.334196] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76

[2017-12-27 06:21:04.334229] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) to rpc-transport (test-client-0)

[2017-12-27 06:21:04.334280] T [MSGID: 0] [afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1

[2017-12-27 06:21:04.334311] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000

[2017-12-27 06:21:04.334330] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76

[2017-12-27 06:21:04.334354] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) to rpc-transport (test-client-1)

[2017-12-27 06:21:04.334904] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-1: received rpc message (RPC XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) from rpc-transport (test-client-1)

[2017-12-27 06:21:04.334983] T [MSGID: 0] [client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-1 returned 0

[2017-12-27 06:21:04.335128] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-0: received rpc message (RPC XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) from rpc-transport (test-client-0)

[2017-12-27 06:21:04.335165] T [MSGID: 0] [client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-0 returned 0

[2017-12-27 06:21:04.335193] T [MSGID: 0] [afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1

[2017-12-27 06:21:04.335224] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000

[2017-12-27 06:21:04.335253] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76

[2017-12-27 06:21:04.335284] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) to rpc-transport (test-client-1)

[2017-12-27 06:21:04.335870] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-1: received rpc message (RPC XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) from rpc-transport (test-client-1)

[2017-12-27 06:21:04.335902] D [MSGID: 0] [client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-1: remote operation failed: No data available

[2017-12-27 06:21:04.335922] D [MSGID: 0] [client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-1 returned -1 error: No data available [No data available]

[2017-12-27 06:21:04.335950] D [MSGID: 0] [afr-inode-read.c:1720:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-replicate-0 returned -1 error: No data available [No data available]

 

 

Best Regards,

George

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux