Re: write request hung in write-behind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To me, all 'df' commands on specific(not all) nfs client hung forever.  The temporary solution is disable performance.nfs.write-behind and cluster.eager-lock. 

I'll try to get more info back if encounter this problem again .


 
发件人: Raghavendra Gowdappa
时间: 2019/06/04(星期二)09:55
收件人: Xie Changlong;Ravishankar Narayanankutty;Karampuri, Pranith;
抄送人: gluster-users;
主题: Re: Re: write request hung in write-behind


On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong <zgrep@xxxxxxx> wrote:
Firstly i correct myself, write request followed by 771(not 1545) FLUSH requests.  I've attach gnfs dump file, totally 774 pending call-stacks,
771 of them pending on write-behind and the deepest call-stack is afr.

+Ravishankar Narayanankutty +Karampuri, Pranith 

Are you sure these were not call-stacks of in-progress ops? One way of confirming that would be to take statedumps periodically (say 3 min apart). Hung call stacks will be common to all the statedumps.


[global.callpool.stack.771]
stack=0x7f517f557f60
uid=0
gid=0
pid=0
unique=0
lk-owner=
op=stack
type=0
cnt=3

[global.callpool.stack.771.frame.1]
frame=0x7f517f655880
ref_count=0
translator=cl35vol01-replicate-7
complete=0
parent=cl35vol01-dht
wind_from=dht_writev
wind_to=subvol->fops->writev
unwind_to=dht_writev_cbk

[global.callpool.stack.771.frame.2]
frame=0x7f518ed90340
ref_count=1
translator=cl35vol01-dht
complete=0
parent=cl35vol01-write-behind
wind_from=wb_fulfill_head
wind_to=FIRST_CHILD (frame->this)->fops->writev
unwind_to=wb_fulfill_cbk

[global.callpool.stack.771.frame.3]
frame=0x7f516d3baf10
ref_count=1
translator=cl35vol01-write-behind
complete=0

[global.callpool.stack.772]
stack=0x7f51607a5a20
uid=0
gid=0
pid=0
unique=0
lk-owner=a0715b77517f0000
op=stack
type=0
cnt=1

[global.callpool.stack.772.frame.1]
frame=0x7f516ca2d1b0
ref_count=0
translator=cl35vol01-replicate-7
complete=0

[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5  glusterdump.20106.dump.1559038081  |grep translator | wc -l
774
[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5  glusterdump.20106.dump.1559038081 |grep complete |wc -l
774
[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5  glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l
774
[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5  glusterdump.20106.dump.1559038081  |grep translator | grep write-behind |wc -l
771
[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5  glusterdump.20106.dump.1559038081  |grep translator | grep replicate-7 | wc -l
2
[root@rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5  glusterdump.20106.dump.1559038081  |grep translator | grep glusterfs | wc -l
1



 
时间: 2019/06/03(星期一)14:46
收件人: Xie Changlong;
抄送人: gluster-users;
主题: Re: write request hung in write-behind


On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong <zgrep@xxxxxxx> wrote:
Hi all

Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind followed by 1545 FLUSH requests. I found a similar
bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure if it's the right one. 

[xlator.performance.write-behind.wb_inode]
path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg
inode=0x7f51775b71a0
window_conf=1073741824
window_current=293822
transit-size=293822
dontsync=0

[.WRITE]
request-ptr=0x7f516eec2060
refcount=1
wound=yes
generation-number=1
req->op_ret=293822
req->op_errno=0
sync-attempts=1
sync-in-progress=yes

Note that the sync is still in progress. This means, write-behind has wound the write-request to its children and yet to receive the response (unless there is a bug in accounting of sync-in-progress). So, its likely that there are callstacks into children of write-behind, which are not complete yet. Are you sure the deepest hung call-stack is in write-behind? Can you check for frames with "complete=0"? 

size=293822
offset=1048576
lied=-1
append=0
fulfilled=0
go=-1

[.FLUSH]
request-ptr=0x7f517c2badf0
refcount=1
wound=no
generation-number=2
req->op_ret=-1
req->op_errno=116
sync-attempts=0

[.FLUSH]
request-ptr=0x7f5173e9f7b0
refcount=1
wound=no
generation-number=2
req->op_ret=0
req->op_errno=0
sync-attempts=0

[.FLUSH]
request-ptr=0x7f51640b8ca0
refcount=1
wound=no
generation-number=2
req->op_ret=0
req->op_errno=0
sync-attempts=0

[.FLUSH]
request-ptr=0x7f516f3979d0
refcount=1
wound=no
generation-number=2
req->op_ret=0
req->op_errno=0
sync-attempts=0

[.FLUSH]
request-ptr=0x7f516f6ac8d0
refcount=1
wound=no
generation-number=2
req->op_ret=0
req->op_errno=0
sync-attempts=0


Any comments would be appreciated!

Thanks
-Xie


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux