Re: Shall we revert quota-anon-fd.t?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/11/2014 10:45 AM, Pranith Kumar Karampuri wrote:

On 06/11/2014 09:45 AM, Vijay Bellur wrote:
On 06/11/2014 08:21 AM, Pranith Kumar Karampuri wrote:
hi,
    I see that quota-anon-fd.t is causing too many spurious failures. I
think we should revert it and raise a bug so that it can be fixed and
committed again along with the fix.


I think we can do that. The problem here is stemming from the issue
that nfs can deadlock when we have client and servers on the same node
with system memory utilization being on the higher side. We also need
to look into other nfs tests to determine if there are similar
possibilities.

I doubt it is because of that, there are so many nfs mount tests,

I have been following this problem closely on b.g.o. This backtrace does indicate dd being hung:

INFO: task dd:6039 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.3.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dd            D ffff880028100840     0  6039   5704 0x00000080
 ffff8801f843faa8 0000000000000286 ffff8801ffffffff 01eff88bb6f58e28
 ffff8801db96bb80 ffff8801f8213590 00000000036c74dc ffffffffac6f4edf
 ffff8801faf11af8 ffff8801f843ffd8 000000000000fbc8 ffff8801faf11af8
Call Trace:
 [<ffffffff810a70b1>] ? ktime_get_ts+0xb1/0xf0
 [<ffffffff8111f940>] ? sync_page+0x0/0x50
 [<ffffffff815280b3>] io_schedule+0x73/0xc0
 [<ffffffff8111f97d>] sync_page+0x3d/0x50
 [<ffffffff81528b7f>] __wait_on_bit+0x5f/0x90
 [<ffffffff8111fbb3>] wait_on_page_bit+0x73/0x80
 [<ffffffff8109b330>] ? wake_bit_function+0x0/0x50
 [<ffffffff81135c05>] ? pagevec_lookup_tag+0x25/0x40
 [<ffffffff8111ffdb>] wait_on_page_writeback_range+0xfb/0x190
 [<ffffffff811201a8>] filemap_write_and_wait_range+0x78/0x90
 [<ffffffff811baa4e>] vfs_fsync_range+0x7e/0x100
 [<ffffffff811bab1b>] generic_write_sync+0x4b/0x50
 [<ffffffff81122056>] generic_file_aio_write+0xe6/0x100
 [<ffffffffa042f20e>] nfs_file_write+0xde/0x1f0 [nfs]
 [<ffffffff81188c8a>] do_sync_write+0xfa/0x140
 [<ffffffff8152a825>] ? page_fault+0x25/0x30
 [<ffffffff8109b2b0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8128ec6f>] ? __clear_user+0x3f/0x70
 [<ffffffff8128ec51>] ? __clear_user+0x21/0x70
 [<ffffffff812263d6>] ? security_file_permission+0x16/0x20
 [<ffffffff81188f88>] vfs_write+0xb8/0x1a0
 [<ffffffff81189881>] sys_write+0x51/0x90
 [<ffffffff810e1e6e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

I have seen dd being in uninterruptible sleep on b.g.o. There are also instances [1] where anon-fd-nfs has run for close to 6000+ seconds. This definitely points to the nfs deadlock.


only
this one keeps failing for the past 2-3 days.

It is a function of the system memory consumption and what oom killer decides to kill. If NFS or a glusterfsd process gets killed, then the test unit will fail. If the test can continue till the system reclaims memory, it can possibly succeed.

However, there could be other possibilities and we need to root cause them as well.


-Vijay

[1] http://build.gluster.org/job/regression/4783/console

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux