On Thu, Feb 12, 2015 at 11:39:51PM +0530, Pranith Kumar Karampuri wrote: > > On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote: > > > >On 02/12/2015 08:15 PM, Xavier Hernandez wrote: > >>I've made some more investigation and the problem seems worse. > >> > >>It seems that NFS sends a huge amount of requests without waiting for > >>answers (I've had more than 1400 requests ongoing). Probably there will > >>be many factors that can influence on the load that this causes, and one > >>of them could be ec, but it's not related exclusively to ec. I've > >>repeated the test using a replica 3 and a replica 2 volumes and the > >>problem still happens. > >> > >>The test basically writes a file to an NFS mount using 'dd'. The file > >>has a size of 1GB. With a smaller file, the test passes successfully. > >Using NFS client and gluster NFS server on same machine with BIG file dd > >operations is known to cause hangs. anon-fd-quota.t used to give similar > >problems so we changed the test to not involve NFS mounts. > I don't re-collect the exact scenario. Avati found the deadlock of memory > allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised > this bug then. CCed him to the thread as well if he knows the exact > scenario. This is a well know issue. When a system is under memory pressure, it will try to flush dirty pages from the VFS. The NFS-client will send the dirty pages over the network to the NFS-server. Unfortunately, the NFS-server needs to allocate memory for the handling or the WRITE procedures. This causes a loop and will most often get the system into a hang situation. Mounting with "-o sync", or flushing outstanding I/O from the client side should normally be sufficient to prevent these issues. Niels > > Pranith > > > >Pranith > >> > >>One important thing to note is that I'm not using powerful servers (a > >>dual core Intel Atom), but this problem shouldn't happen anyway. It can > >>even happen on more powerful servers if they are busy doing other things > >>(maybe this is what's happening on jenkins' slaves). > >> > >>I think that this causes some NFS requests to timeout. This can be seen > >>in /var/log/messages (there are many of these messages): > >> > >>Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not > >>responding, timed out > >> > >>nfs log also has many errors: > >> > >>[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] > >>0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: NFS3, > >>ProgVers: 3, Proc: 7) to rpc > >>-transport (socket.nfs-server) > >>[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] > >>0-nfs-nfsv3: Reply submission failed > >> > >>Additionally this causes disconnections from NFS that are not correctly > >>handled causing that a thread gets stuck in an infinite loop (I haven't > >>analyzed this problem deeply, but it seems like an attempt to use an > >>already disconnected socket). After a while, I get this error on the nfs > >>log: > >> > >>[2015-02-12 14:20:19.545429] C > >>[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: > >>server 192.168.200.61:49152 has not responded in the last 42 seconds, > >>disconnecting. > >> > >>The console executing the test shows this (nfs.t is creating a replica 3 > >>instead of a dispersed volume): > >> > >># ./run-tests.sh tests/basic/ec/nfs.t > >> > >>... GlusterFS Test Framework ... > >> > >>Running tests in file ./tests/basic/ec/nfs.t > >>[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing > >>‘/mnt/nfs/0/test’: Input/output error > >>[14:12:52] ./tests/basic/ec/nfs.t .. 9/10 > >>not ok 9 > >>[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests > >>[14:27:41] > >> > >>Test Summary Report > >>------------------- > >>./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1) > >>Failed test: 9 > >>Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr > >>3.45 csys = 4.89 CPU) > >>Result: FAIL > >>Failed tests ./tests/basic/ec/nfs.t > >> > >>Note that the test takes almost 15 minutes to complete. > >> > >>Is there any way to limit the number of requests NFS sends without > >>having an answer ? > >> > >>Xavi > >> > >>On 02/11/2015 04:20 PM, Shyam wrote: > >>>On 02/11/2015 09:40 AM, Xavier Hernandez wrote: > >>>>Hi, > >>>> > >>>>it seems that there are some failures in ec/nfs.t test on regression > >>>>tests. Doing some investigation I've found that before applying the > >>>>multi-threaded patch (commit 5e25569e) the problem does not seem to > >>>>happen. > >>> > >>>This has in interesting history in failures, on the regression runs for > >>>the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but > >>>not nfs.t). > >>> > >>>The patch that allows configuration of MT epoll is where this started > >>>failing around Feb 5th (but later passed). (see patchset 7 failures on, > >>>http://review.gluster.org/#/c/9488/ ) > >>> > >>>I state the above, as it may help narrowing down the changes in EC > >>>(maybe) that could have caused it. > >>> > >>>Also in the latter commit, there was an error configuring the number of > >>>threads so all regression runs would have run with a single epoll > >>>thread > >>>(the MT epoll patch had this hard coded, so that would have run with 2 > >>>threads, but did not show up the issue (patch: > >>>http://review.gluster.org/#/c/3842/)). > >>> > >>>Again I state the above, as this should not be exposing a > >>>race/bug/problem due to the multi threaded nature of epoll, but of > >>>course needs investigation. > >>> > >>>> > >>>>I'm not sure if this patch is the cause or it has revealed some bug in > >>>>ec or any other xlator. > >>> > >>>I guess we can reproduce this issue? If so I would try setting > >>>client.event-threads on master branch to 1, restarting the volume and > >>>then running the test (as a part of the test itself maybe) to eliminate > >>>the possibility that MT epoll is causing it. > >>> > >>>My belief on MT epoll causing it is in doubt as the runs failed on the > >>>http://review.gluster.org/#/c/9488/ (configuration patch), which had > >>>the > >>>thread count as 1 due to a bug in that code. > >>> > >>>> > >>>>I can try to identify it (any help will be appreciated), but it may > >>>>take > >>>>some time. Would it be better to remove the test in the meantime ? > >>> > >>>I am checking if this is reproducible on my machine, so that I can > >>>possibly see what is going wrong. > >>> > >>>Shyam > >>>_______________________________________________ > >>>Gluster-devel mailing list > >>>Gluster-devel@xxxxxxxxxxx > >>>http://www.gluster.org/mailman/listinfo/gluster-devel > >>_______________________________________________ > >>Gluster-devel mailing list > >>Gluster-devel@xxxxxxxxxxx > >>http://www.gluster.org/mailman/listinfo/gluster-devel > > > >_______________________________________________ > >Gluster-devel mailing list > >Gluster-devel@xxxxxxxxxxx > >http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel