Re: [Gluster-devel] GlusterFS FUSE client hangs on rsyncing lots of file

Raghavendra G <raghavendra@xxxxxxxxxxx> · Fri, 5 Feb 2016 11:36:45 +0530

+soumyak, +rtalur.

On Fri, Jan 29, 2016 at 2:34 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

On 01/28/2016 05:05 PM, Pranith Kumar Karampuri wrote:

With baul jianguo's help I am able to see that FLUSH fops are hanging for some reason.

pk1@localhost - ~/Downloads

17:02:13 :) ⚡ grep "unique=" client-dump1.txt

unique=3160758373

unique=2073075682

unique=1455047665

unique=0

pk1@localhost - ~/Downloads

17:02:21 :) ⚡ grep "unique=" client-dump-0.txt

unique=3160758373

unique=2073075682

unique=1455047665

unique=0

I will be debugging a bit more and post my findings.

+Raghavendra G

All the stubs are hung in write-behind. I checked that the statedumps doesn't have any writes in progress. May be because of some race, flush fop is not resumed after write calls are complete? It seems this issue happens only when io-threads is enabled on the client.

Pranith

Pranith

On 01/28/2016 03:18 PM, baul jianguo wrote:

the client glusterfs gdb info， main thread id is 70800。

  In the top output，70800 thread time 1263:30，70810 thread time

1321:10，other thread time too small。

(gdb) thread apply all bt

Thread 9 (Thread 0x7fc21acaf700 (LWP 70801)):

#0  0x00007fc21cc0c535 in sigwait () from /lib64/libpthread.so.0

#1  0x000000000040539b in glusterfs_sigwaiter (arg=<value optimized

out>) at glusterfsd.c:1653

#2  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#3  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7fc21a2ae700 (LWP 70802)):

#0  0x00007fc21cc08a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from

/lib64/libpthread.so.0

#1  0x00007fc21ded02bf in syncenv_task (proc=0x121ee60) at syncop.c:493

#2  0x00007fc21ded6300 in syncenv_processor (thdata=0x121ee60) at syncop.c:571

#3  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#4  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7fc2198ad700 (LWP 70803)):

#0  0x00007fc21cc08a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from

/lib64/libpthread.so.0

#1  0x00007fc21ded02bf in syncenv_task (proc=0x121f220) at syncop.c:493

#2  0x00007fc21ded6300 in syncenv_processor (thdata=0x121f220) at syncop.c:571

#3  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#4  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7fc21767d700 (LWP 70805)):

#0  0x00007fc21cc0bfbd in nanosleep () from /lib64/libpthread.so.0

#1  0x00007fc21deb16bc in gf_timer_proc (ctx=0x11f2010) at timer.c:170

#2  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#3  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fc20fb1e700 (LWP 70810)):

#0  0x00007fc21c566987 in readv () from /lib64/libc.so.6

#1  0x00007fc21accbc55 in fuse_thread_proc (data="" at

fuse-bridge.c:4752

#2  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#3  0x00007fc21c56e93d in clone () from /lib64/libc.so.6 时间最多

Thread 4 (Thread 0x7fc20f11d700 (LWP 70811)): 少点

#0  0x00007fc21cc0b7dd in read () from /lib64/libpthread.so.0

#1  0x00007fc21acc0e73 in read (data="" optimized out>) at

/usr/include/bits/unistd.h:45

#2  notify_kernel_loop (data="" optimized out>) at fuse-bridge.c:3786

#3  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#4  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fc1b16fe700 (LWP 206224)):

---Type <return> to continue, or q <return> to quit---

#0  0x00007fc21cc08a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from

/lib64/libpthread.so.0

#1  0x00007fc20e515e60 in iot_worker (data="" at io-threads.c:157

#2  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#3  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fc1b0bfb700 (LWP 214361)):

#0  0x00007fc21cc08a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from

/lib64/libpthread.so.0

#1  0x00007fc20e515e60 in iot_worker (data="" at io-threads.c:157

#2  0x00007fc21cc04a51 in start_thread () from /lib64/libpthread.so.0

#3  0x00007fc21c56e93d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fc21e31e700 (LWP 70800)):

#0  0x00007fc21c56ef33 in epoll_wait () from /lib64/libc.so.6

#1  0x00007fc21deea3e7 in event_dispatch_epoll (event_pool=0x120dec0)

at event-epoll.c:428

#2  0x00000000004075e4 in main (argc=4, argv=0x7fff3dc93698) at

glusterfsd.c:1983

On Thu, Jan 28, 2016 at 5:29 PM, baul jianguo <roidinev@xxxxxxxxx> wrote:

http://pastebin.centos.org/38941/

client statedump,only the pid 27419,168030,208655 hang，you can search

this pid in the statedump file。

On Wed, Jan 27, 2016 at 4:35 PM, Pranith Kumar Karampuri

<pkarampu@xxxxxxxxxx> wrote:

Hi,

       If the hang appears on enabling client side io-threads then it could

be because of some race that is seen when io-threads is enabled on the

client side. 2 things will help us debug this issue:

1) thread apply all bt inside gdb (with debuginfo rpms/debs installed )

2) Complete statedump of the mount at two intervals preferably 10 seconds

apart. It becomes difficult to find out which ones are stuck vs the ones

that are on-going when we have just one statedump. If we have two, we can

find which frames are common in both of the statedumps and then take a

closer look there.

Feel free to ping me on #gluster-dev, nick: pranithk if you have the process

hung in that state and you guys don't mind me do a live debugging with you

guys. This option is the best of the lot!

Thanks a lot baul, Oleksandr for the debugging so far!

Pranith

On 01/25/2016 01:03 PM, baul jianguo wrote:

3.5.7 also hangs.only the flush op hung. Yes,off the

performance.client-io-threads ,no hang.

The hang does not relate the client kernel version.

One client statdump about flush op,any abnormal?

[global.callpool.stack.12]

uid=0

gid=0

pid=14432

unique=16336007098

lk-owner=77cb199aa36f3641

op=FLUSH

type=1

cnt=6

[global.callpool.stack.12.frame.1]

ref_count=1

translator=fuse

complete=0

[global.callpool.stack.12.frame.2]

ref_count=0

translator=datavolume-write-behind

complete=0

parent=datavolume-read-ahead

wind_from=ra_flush

wind_to=FIRST_CHILD (this)->fops->flush

unwind_to=ra_flush_cbk

[global.callpool.stack.12.frame.3]

ref_count=1

translator=datavolume-read-ahead

complete=0

parent=datavolume-open-behind

wind_from=default_flush_resume

wind_to=FIRST_CHILD(this)->fops->flush

unwind_to=default_flush_cbk

[global.callpool.stack.12.frame.4]

ref_count=1

translator=datavolume-open-behind

complete=0

parent=datavolume-io-threads

wind_from=iot_flush_wrapper

wind_to=FIRST_CHILD(this)->fops->flush

unwind_to=iot_flush_cbk

[global.callpool.stack.12.frame.5]

ref_count=1

translator=datavolume-io-threads

complete=0

parent=datavolume

wind_from=io_stats_flush

wind_to=FIRST_CHILD(this)->fops->flush

unwind_to=io_stats_flush_cbk

[global.callpool.stack.12.frame.6]

ref_count=1

translator=datavolume

complete=0

parent=fuse

wind_from=fuse_flush_resume

wind_to=xl->fops->flush

unwind_to=fuse_err_cbk

On Sun, Jan 24, 2016 at 5:35 AM, Oleksandr Natalenko

<oleksandr@xxxxxxxxxxxxxx> wrote:

With "performance.client-io-threads" set to "off" no hangs occurred in 3

rsync/rm rounds. Could that be some fuse-bridge lock race? Will bring

that

option to "on" back again and try to get full statedump.

On четвер, 21 січня 2016 р. 14:54:47 EET Raghavendra G wrote:

On Thu, Jan 21, 2016 at 10:49 AM, Pranith Kumar Karampuri <

pkarampu@xxxxxxxxxx> wrote:

On 01/18/2016 02:28 PM, Oleksandr Natalenko wrote:

XFS. Server side works OK, I'm able to mount volume again. Brick is

30%

full.

Oleksandr,

        Will it be possible to get the statedump of the client, bricks

output next time it happens?

https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.m 

d#how-to-generate-statedump

We also need to dump inode information. To do that you've to add

"all=yes"

to /var/run/gluster/glusterdump.options before you issue commands to get

statedump.

Pranith

On понеділок, 18 січня 2016 р. 15:07:18 EET baul jianguo wrote:

What is your brick file system? and the glusterfsd process and all

thread status?

I met same issue when client app such as rsync stay in D status,and

the brick process and relate thread also be in the D status.

And the brick dev disk util is 100% .

On Sun, Jan 17, 2016 at 6:13 AM, Oleksandr Natalenko

<oleksandr@xxxxxxxxxxxxxx> wrote:

Wrong assumption, rsync hung again.

On субота, 16 січня 2016 р. 22:53:04 EET Oleksandr Natalenko wrote:

One possible reason:

cluster.lookup-optimize: on

cluster.readdir-optimize: on

I've disabled both optimizations, and at least as of now rsync

still

does

its job with no issues. I would like to find out what option causes

such

a

behavior and why. Will test more.

On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko

wrote:

Another observation: if rsyncing is resumed after hang, rsync

itself

hangs a lot faster because it does stat of already copied files.

So,

the

reason may be not writing itself, but massive stat on GlusterFS

volume

as well.

15.01.2016 09:40, Oleksandr Natalenko написав:

While doing rsync over millions of files from ordinary partition

to

GlusterFS volume, just after approx. first 2 million rsync hang

happens, and the following info appears in dmesg:

===

[17075038.924481] INFO: task rsync:10310 blocked for more than

120

seconds.

[17075038.931948] "echo 0 >

/proc/sys/kernel/hung_task_timeout_secs"

disables this message.

[17075038.940748] rsync           D ffff88207fc13680     0 10310

10309 0x00000080

[17075038.940752]  ffff8809c578be18 0000000000000086

ffff8809c578bfd8

0000000000013680

[17075038.940756]  ffff8809c578bfd8 0000000000013680

ffff880310cbe660

ffff881159d16a30

[17075038.940759]  ffff881e3aa25800 ffff8809c578be48

ffff881159d16b10

ffff88087d553980

[17075038.940762] Call Trace:

[17075038.940770] [<ffffffff8160a1d9>] schedule+0x29/0x70

[17075038.940797] [<ffffffffa023a53d>]

__fuse_request_send+0x13d/0x2c0

[fuse]

[17075038.940801] [<ffffffffa023db30>] ?

fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]

[17075038.940805] [<ffffffff81098350>] ? wake_up_bit+0x30/0x30

[17075038.940809] [<ffffffffa023a6d2>]

fuse_request_send+0x12/0x20

[fuse]

[17075038.940813] [<ffffffffa024382f>] fuse_flush+0xff/0x150

[fuse]

[17075038.940817] [<ffffffff811c45c4>] filp_close+0x34/0x80

[17075038.940821] [<ffffffff811e4ed8>] __close_fd+0x78/0xa0

[17075038.940824] [<ffffffff811c6103>] SyS_close+0x23/0x50

[17075038.940828] [<ffffffff81614de9>]

system_call_fastpath+0x16/0x1b

===

rsync blocks in D state, and to kill it, I have to do umount

--lazy

on

GlusterFS mountpoint, and then kill corresponding client

glusterfs

process. Then rsync exits.

Here is GlusterFS volume info:

===

Volume Name: asterisk_records

Type: Distributed-Replicate

Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4

Status: Started

Number of Bricks: 3 x 2 = 6

Transport-type: tcp

Bricks:

Brick1:

server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_0 

1

/as

te

risk/records Brick2:

server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage 

_

01/

as

terisk/records Brick3:

server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_0 

2

/as

te

risk/records Brick4:

server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage 

_

02/

as

terisk/records Brick5:

server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage 

_

03/

as

terisk/records Brick6:

server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storag 

e

_03

/a

sterisk/records Options Reconfigured:

cluster.lookup-optimize: on

cluster.readdir-optimize: on

client.event-threads: 2

network.inode-lru-limit: 4096

server.event-threads: 4

performance.client-io-threads: on

storage.linux-aio: on

performance.write-behind-window-size: 4194304

performance.stat-prefetch: on

performance.quick-read: on

performance.read-ahead: on

performance.flush-behind: on

performance.write-behind: on

performance.io-thread-count: 2

performance.cache-max-file-size: 1048576

performance.cache-size: 33554432

features.cache-invalidation: on

performance.readdir-ahead: on

===

The issue reproduces each time I rsync such an amount of files.

How could I debug this issue better?

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Raghavendra G

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users