Re: Problems with graph switch in disperse

Raghavendra G <raghavendra@xxxxxxxxxxx> · Fri, 2 Jan 2015 10:15:06 +0530

On Wed, Dec 31, 2014 at 11:25 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote:

On 27.12.2014 13:43, lidi@xxxxxxxxxxxxx wrote:

I tracked this problem, and found that the loc.parent and loc.pargfid are all null in the call sequences below:

ec_manager_writev() -> ec_get_size_version() -> ec_lookup(). This can cause server_resolve() return an EINVAL.

A replace-brick will cause all opened fd and inode table recreate, but ec_lookup() get the loc from fd->_ctx. 

So loc.parent and loc.pargfid are missing while fd changed.  Other xlators always do a lookup from root  

directory, so never cause this problem. It seems that a recursive lookup from root directory may address this 

issue.

EINVAL error is returned by protocol/server when it tries to resolve an inode based on a loc. If loc's 'name' field is not NULL nor empty, it tries to resolve the inode based on <pargfid>/<name>. The problem here is that pargfid is 00...00.

To solve this issue I've modified ec_loc_setup_parent() so that it clears loc's 'name' if parent inode cannot be determined. This forces protocol/server to resolve the inode based on <gfid>, which is valid and can be resolved successfully.

However this doesn't fully solve the bug. After solving this issue, I get an EIO error. Further investigations seems to indicate that this is caused by a locking problem caused by an incorrect management of ESTALE when the brick is replaced.

ESTALE indicates either any of the following situations:

1. In the case of named-lookup (loc containing <pargfid>/<name>), <pargfid> is not present. Which means parent is not present on the brick
2. In the case of nameless lookup (loc containing only <gfid> of the file), file/directory represented by gfid is not present on brick.

Which among the above two scenarios is your case?

> 
 I'll upload a patch shortly to solve these issues.

Xavi

----- 原邮件信息 -----
发件人：Raghavendra Gowdappa 
发送时间：14-12-24 21:48:56
收件人：Xavier Hernandez 
抄送人：Gluster Devel 
主题：Re:  Problems with graph switch in disperse

Do you know the origins of EIO? fuse-bridge only fails a lookup fop with EIO (when NULL gfid is received in a successful lookup reply). So, there might be other xlator which is sending EIO.

 ----- Original Message -----
 > From: "Xavier Hernandez" 
 > To: "Gluster Devel" 
 > Sent: Wednesday, December 24, 2014 6:25:17 PM
 > Subject: [Gluster-devel] Problems with graph switch in disperse
 > 
 > Hi,
 > 
 > I'm experiencing a problem when gluster graph is changed as a result of
 > a replace-brick operation (probably with any other operation that
 > changes the graph) while the client is also doing other tasks, like
 > writing a file.
 > 
 > When operation starts, I see that the replaced brick is disconnected,
 > but writes continue working normally with one brick less.
 > 
 > At some point, another graph is created and comes online. Remaining
 > bricks on the old graph are disconnected and the old graph is destroyed.
 > I see how new write requests are sent to the new graph.
 > 
 > This seems correct. However there's a point where I see this:
 > 
 > [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume]
 > 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472)
 > [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec:
 > WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1]
 > frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1
 > {111:000:000} idx=0
 > [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record]
 > 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner:
 > d025e932897f0000
 > [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush]
 > 2-patchy-io-cache: locked inode(0x16d2810)
 > [2014-12-24 11:29:58.541354] T
 > [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request
 > fraglen 152, payload: 84, rpc hdr: 68
 > [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush]
 > 2-patchy-io-cache: unlocked inode(0x16d2810)
 > [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush]
 > 2-patchy-io-cache: locked inode(0x16d2810)
 > [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush]
 > 2-patchy-io-cache: unlocked inode(0x16d2810)
 > [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit]
 > 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3,
 > ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0)
 > [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk]
 > 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error)
 > 
 > It seems that fuse still has a write request pending for graph 0. It is
 > resumed but it returns EIO without calling the xlator stack (operations
 > seen between the two log messages are from other operations and they are
 > sent to graph 2). I'm not sure why this happens and how I should aviod this.
 > 
 > I tried the same scenario with replicate and it seems to work, so there
 > must be something wrong in disperse, but I don't see where the problem
 > could be.
 > 
 > Any ideas ?
 > 
 > Thanks,
 > 
 > Xavi
 > _______________________________________________
 > Gluster-devel mailing list
 > Gluster-devel@xxxxxxxxxxx
 > http://www.gluster.org/mailman/listinfo/gluster-devel
 > 
 _______________________________________________
 Gluster-devel mailing list
 Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel