Re: glusterfsd Call Trace Messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I think this is what is happening. Someone please correct me if I am wrong.

I think this is happening because nfs client, nfs server and bricks being in the same machine. What happens is, when the large write comes, nfs client sends the request to the nfs server and the nfs server sends it to the brick. The brick process tries to write it via making the write system call and the call enters the kernel. Kernel might not find memory available for performing the operation and thus wants to free some memory. NFS client does heavy caching. It might have saved many things in its memory. So, it has to free some memory. But nfs client is stuck with the write operation. It is still waiting for a response from the server. So it will not be able to free the memory till it gets a response from the nfs server (which in turn is waiting for a response from the brick) for the write operation it sent. But brick cannot get a response from kernel until kernel is able to get some memory for the operation and perform write.

Thus it is stuck in this deadlock. Thats why you see your setup blocked.

Can you please mount your volume via nfs on a different node other than the gluster server, and see if the issue happens again?


Regards,
Raghavendra

On Wed, Feb 3, 2016 at 2:32 PM, Taste-Of-IT <kontakt@xxxxxxxxxxxxxx> wrote:
Am 2016-02-03 20:09, schrieb Raghavendra Bhat:
Hi,

Is your nfs client mounted on one of the gluster serves? 

Regards,
Raghavendra

On Wed, Feb 3, 2016 at 10:08 AM, Taste-Of-IT <kontakt@xxxxxxxxxxxxxx>
wrote:

Hello,

hope some expert can help. I have a 2 Brick 1 Volume Distributed
GlusterFS in Version 3.7.6 on Debian. The volume is shared via nfs.
If i copy via midnight commander large files (>30GB), i got
following messages. I replace sata cable, checked memory but i didnt
find an error. SMART Values on all disks seems ok. After 30-40
minutes i can copy again. Any Idea?

Feb  3 12:46:31 gluster01 kernel: [11186.588367] [sched_delayed]
sched: RT throttling activated
Feb  3 12:56:09 gluster01 kernel: [11764.932749] glusterfsd   
  D ffff88040ca6d788     0  1150      1 0x00000000
Feb  3 12:56:09 gluster01 kernel: [11764.932759] 
ffff88040ca6d330 0000000000000082 0000000000012f00 ffff88040ad1bfd8
Feb  3 12:56:09 gluster01 kernel: [11764.932767] 
0000000000012f00 ffff88040ca6d330 ffff88040ca6d330 ffff88040ad1be88
Feb  3 12:56:09 gluster01 kernel: [11764.932773] 
ffff88040e18d4b8 ffff88040e18d4a0 ffffffff00000000 ffff88040e18d4a8
Feb  3 12:56:09 gluster01 kernel: [11764.932780] Call Trace:
Feb  3 12:56:09 gluster01 kernel: [11764.932796] 
[<ffffffff81512cd5>] ? rwsem_down_write_failed+0x1d5/0x320
Feb  3 12:56:09 gluster01 kernel: [11764.932807] 
[<ffffffff812b7d13>] ? call_rwsem_down_write_failed+0x13/0x20
Feb  3 12:56:09 gluster01 kernel: [11764.932816] 
[<ffffffff812325b0>] ? proc_keys_show+0x3f0/0x3f0
Feb  3 12:56:09 gluster01 kernel: [11764.932823] 
[<ffffffff81512649>] ? down_write+0x29/0x40
Feb  3 12:56:09 gluster01 kernel: [11764.932830] 
[<ffffffff811592bc>] ? vm_mmap_pgoff+0x6c/0xc0
Feb  3 12:56:09 gluster01 kernel: [11764.932838] 
[<ffffffff8116ea4e>] ? SyS_mmap_pgoff+0x10e/0x250
Feb  3 12:56:09 gluster01 kernel: [11764.932844] 
[<ffffffff811a969a>] ? SyS_readv+0x6a/0xd0
Feb  3 12:56:09 gluster01 kernel: [11764.932853] 
[<ffffffff81513ccd>] ? system_call_fast_compare_end+0x10/0x15
Feb  3 12:58:09 gluster01 kernel: [11884.979935] glusterfsd   
  D ffff88040ca6d788     0  1150      1 0x00000000
Feb  3 12:58:09 gluster01 kernel: [11884.979945] 
ffff88040ca6d330 0000000000000082 0000000000012f00 ffff88040ad1bfd8
Feb  3 12:58:09 gluster01 kernel: [11884.979952] 
0000000000012f00 ffff88040ca6d330 ffff88040ca6d330 ffff88040ad1be88
Feb  3 12:58:09 gluster01 kernel: [11884.979959] 
ffff88040e18d4b8 ffff88040e18d4a0 ffffffff00000000 ffff88040e18d4a8
Feb  3 12:58:09 gluster01 kernel: [11884.979966] Call Trace:
Feb  3 12:58:09 gluster01 kernel: [11884.979982] 
[<ffffffff81512cd5>] ? rwsem_down_write_failed+0x1d5/0x320
Feb  3 12:58:09 gluster01 kernel: [11884.979993] 
[<ffffffff812b7d13>] ? call_rwsem_down_write_failed+0x13/0x20
Feb  3 12:58:09 gluster01 kernel: [11884.980001] 
[<ffffffff812325b0>] ? proc_keys_show+0x3f0/0x3f0
Feb  3 12:58:09 gluster01 kernel: [11884.980008] 
[<ffffffff81512649>] ? down_write+0x29/0x40
Feb  3 12:58:09 gluster01 kernel: [11884.980015] 
[<ffffffff811592bc>] ? vm_mmap_pgoff+0x6c/0xc0
Feb  3 12:58:09 gluster01 kernel: [11884.980023] 
[<ffffffff8116ea4e>] ? SyS_mmap_pgoff+0x10e/0x250
Feb  3 12:58:09 gluster01 kernel: [11884.980030] 
[<ffffffff811a969a>] ? SyS_readv+0x6a/0xd0
Feb  3 12:58:09 gluster01 kernel: [11884.980038] 
[<ffffffff81513ccd>] ? system_call_fast_compare_end+0x10/0x15
Feb  3 12:58:09 gluster01 kernel: [11884.980351] mc         
    D ffff88040e6d8fb8     0  5119   1447 0x00000000
Feb  3 12:58:09 gluster01 kernel: [11884.980358] 
ffff88040e6d8b60 0000000000000082 0000000000012f00 ffff88040d5dbfd8
Feb  3 12:58:09 gluster01 kernel: [11884.980365] 
0000000000012f00 ffff88040e6d8b60 ffff88041ec937b0 ffff88041efcc9e8
Feb  3 12:58:09 gluster01 kernel: [11884.980371] 
0000000000000002 ffffffff8113ce00 ffff88040d5dbcb0 ffff88040d5dbd98
Feb  3 12:58:09 gluster01 kernel: [11884.980377] Call Trace:
Feb  3 12:58:09 gluster01 kernel: [11884.980385] 
[<ffffffff8113ce00>] ? wait_on_page_read+0x60/0x60
Feb  3 12:58:09 gluster01 kernel: [11884.980392] 
[<ffffffff81510759>] ? io_schedule+0x99/0x120
Feb  3 12:58:09 gluster01 kernel: [11884.980399] 
[<ffffffff8113ce0a>] ? sleep_on_page+0xa/0x10
Feb  3 12:58:09 gluster01 kernel: [11884.980405] 
[<ffffffff81510adc>] ? __wait_on_bit+0x5c/0x90
Feb  3 12:58:09 gluster01 kernel: [11884.980412] 
[<ffffffff8113cbff>] ? wait_on_page_bit+0x7f/0x90
Feb  3 12:58:09 gluster01 kernel: [11884.980420] 
[<ffffffff810a7bd0>] ? autoremove_wake_function+0x30/0x30
Feb  3 12:58:09 gluster01 kernel: [11884.980426] 
[<ffffffff8114a17d>] ? pagevec_lookup_tag+0x1d/0x30
Feb  3 12:58:09 gluster01 kernel: [11884.980433] 
[<ffffffff8113cce0>] ? filemap_fdatawait_range+0xd0/0x160
Feb  3 12:58:09 gluster01 kernel: [11884.980442] 
[<ffffffff8113e7ca>] ? filemap_write_and_wait_range+0x3a/0x60
Feb  3 12:58:09 gluster01 kernel: [11884.980461] 
[<ffffffffa072363f>] ? nfs_file_fsync+0x7f/0x100 [nfs]
Feb  3 12:58:09 gluster01 kernel: [11884.980476] 
[<ffffffffa0723a2a>] ? nfs_file_write+0xda/0x1a0 [nfs]
Feb  3 12:58:09 gluster01 kernel: [11884.980484] 
[<ffffffff811a7e24>] ? new_sync_write+0x74/0xa0
Feb  3 12:58:09 gluster01 kernel: [11884.980492] 
[<ffffffff811a8562>] ? vfs_write+0xb2/0x1f0
Feb  3 12:58:09 gluster01 kernel: [11884.980500] 
[<ffffffff811a842d>] ? vfs_read+0xed/0x170
Feb  3 12:58:09 gluster01 kernel: [11884.980505] 
[<ffffffff811a90a2>] ? SyS_write+0x42/0xa0
Feb  3 12:58:09 gluster01 kernel: [11884.980513] 
[<ffffffff81513ccd>] ? system_call_fast_compare_end+0x10/0x15

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users [1]



Links:
------
[1] http://www.gluster.org/mailman/listinfo/gluster-users
Hi Raghavendra,
yes in this case i have to mount on one of the gluster server, but it doesnt matter on which i mount and its only a question of time when the trace came.
Taste


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux