On Tue, Jun 4, 2019 at 7:36 AM Xie Changlong <zgrep@xxxxxxx> wrote:
To me, all 'df' commands on specific(not all) nfs client hung forever. The temporary solution is disable performance.nfs.write-behind and cluster.eager-lock.I'll try to get more info back if encounter this problem again .
If you observe this issue again, take successive (at least a minute apart) statedumps of the processes and run https://github.com/gluster/glusterfs/blob/master/extras/identify-hangs.sh on it which will give the information about the hangs.
发件人: Raghavendra Gowdappa时间: 2019/06/04(星期二)09:55抄送人: gluster-users;主题: Re: Re: write request hung in write-behindOn Mon, Jun 3, 2019 at 1:11 PM Xie Changlong <zgrep@xxxxxxx> wrote:Firstly i correct myself, write request followed by 771(not 1545) FLUSH requests. I've attach gnfs dump file, totally 774 pending call-stacks,771 of them pending on write-behind and the deepest call-stack is afr.Are you sure these were not call-stacks of in-progress ops? One way of confirming that would be to take statedumps periodically (say 3 min apart). Hung call stacks will be common to all the statedumps.[global.callpool.stack.771]stack=0x7f517f557f60uid=0gid=0pid=0unique=0lk-owner=op=stacktype=0cnt=3[global.callpool.stack.771.frame.1]frame=0x7f517f655880ref_count=0translator=cl35vol01-replicate-7complete=0parent=cl35vol01-dhtwind_from=dht_writevwind_to=subvol->fops->writevunwind_to=dht_writev_cbk[global.callpool.stack.771.frame.2]frame=0x7f518ed90340ref_count=1translator=cl35vol01-dhtcomplete=0parent=cl35vol01-write-behindwind_from=wb_fulfill_headwind_to=FIRST_CHILD (frame->this)->fops->writevunwind_to=wb_fulfill_cbk[global.callpool.stack.771.frame.3]frame=0x7f516d3baf10ref_count=1translator=cl35vol01-write-behindcomplete=0[global.callpool.stack.772]stack=0x7f51607a5a20uid=0gid=0pid=0unique=0lk-owner=a0715b77517f0000op=stacktype=0cnt=1[global.callpool.stack.772.frame.1]frame=0x7f516ca2d1b0ref_count=0translator=cl35vol01-replicate-7complete=0[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | wc -l774[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep complete |wc -l774[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l774[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep write-behind |wc -l771[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | wc -l2[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc -l1发件人: Raghavendra Gowdappa时间: 2019/06/03(星期一)14:46收件人: Xie Changlong;抄送人: gluster-users;主题: Re: write request hung in write-behindOn Mon, Jun 3, 2019 at 11:57 AM Xie Changlong <zgrep@xxxxxxx> wrote:Hi allTest gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind followed by 1545 FLUSH requests. I found a similar[xlator.performance.write-behind.wb_inode]path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpginode=0x7f51775b71a0window_conf=1073741824window_current=293822transit-size=293822dontsync=0[.WRITE]request-ptr=0x7f516eec2060refcount=1wound=yesgeneration-number=1req->op_ret=293822req->op_errno=0sync-attempts=1sync-in-progress=yesNote that the sync is still in progress. This means, write-behind has wound the write-request to its children and yet to receive the response (unless there is a bug in accounting of sync-in-progress). So, its likely that there are callstacks into children of write-behind, which are not complete yet. Are you sure the deepest hung call-stack is in write-behind? Can you check for frames with "complete=0"?size=293822offset=1048576lied=-1append=0fulfilled=0go=-1[.FLUSH]request-ptr=0x7f517c2badf0refcount=1wound=nogeneration-number=2req->op_ret=-1req->op_errno=116sync-attempts=0[.FLUSH]request-ptr=0x7f5173e9f7b0refcount=1wound=nogeneration-number=2req->op_ret=0req->op_errno=0sync-attempts=0[.FLUSH]request-ptr=0x7f51640b8ca0refcount=1wound=nogeneration-number=2req->op_ret=0req->op_errno=0sync-attempts=0[.FLUSH]request-ptr=0x7f516f3979d0refcount=1wound=nogeneration-number=2req->op_ret=0req->op_errno=0sync-attempts=0[.FLUSH]request-ptr=0x7f516f6ac8d0refcount=1wound=nogeneration-number=2req->op_ret=0req->op_errno=0sync-attempts=0Any comments would be appreciated!Thanks-Xie
--
Pranith
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users