----- Original Message ----- > From: "Emmanuel Dreyfus" <manu@xxxxxxxxxx> > To: "Raghavendra Talur" <rtalur@xxxxxxxxxx> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Thursday, December 31, 2015 4:31:37 PM > Subject: Re: quota.t hangs on NetBSD machines > > On Thu, Dec 31, 2015 at 03:40:54PM +0530, Raghavendra Talur wrote: > > We have threads sleeping, either voluntary (nanosleep) or not (lwp_park), > and this: > > c5223a80 (glusterfs) is in > sleepq_block/cv_timedwait_sig/sbwait/soreceive/soo_read/do_filereadv/sys_readv > Awaiting while reading on a socket. Probably FUSE, but it would be nice > to be certain. > > c5346540 (glusterfs) is in > sleepq_block/cv_timedwait_sig/sigtimedwait1/sys_____sigtimedwait50 > This is ordinary sigtimedwait() but the timeout arguent (third) is zero, > which can let it sleep forever. Is it expected? > > cv_timedwait_sig(c53466b4,c5004b80,0,c53466a4,3,db727e90,c53466a4,c41eb528,db727eac,7ff0) > > c5418020 (glusterfs) is in > sleepq_block/sel_do_scan/pollcommon/sys_poll > This is orinary poll(2). The struct timespec for the timeout is at > db721f18 and again this is an infinite timeout; > crash> x db721f18,2 > db721f18: 0 0 > (NB: 2 words because we run a a 32 bit machine, struct timespec is a > 32 bit time_t and a 32 bit long) > > c53692c0 (perfused) is in > sleepq_block/cv_timedwait_sig/kevent1/sys___kevent50 > Awaiting for data (either from kernel or glusterfs, I do not know). > Again we have an inifinite timeout. > > I note that the FUSE filesystem is responding. Since perfused is > not multithreaded, it suggests it is not the stuck process. It may > have missed a request or reply, though, which would stuck the calling > process. > > Speaking about the calling process. I beleive it is the quota utility? > Indeed awaiting for a reply from the filesystem: > UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND > 0 15221 1406 1546 85 0 3360 1080 puffsrpl I pts/0- 0:00.06 > tests/basic/quota /mnt/glusterfs/0/test_dir/1.txt 256 48 > > Here is its backtrace obtained from gdb: > #0 0xbb69b6f7 in write () from /usr/lib/libc.so.12 > #1 0x080489c0 in nwrite (fd=3, buf=0xbb501000, count=262144) > at tests/basic/quota.c:16 > #2 0x08048a8b in file_write ( > filename=0xbf7ffcb2 "/mnt/glusterfs/0/test_dir/1.txt", bs=262144, > count=48) > at tests/basic/quota.c:48 > #3 0x08048b64 in main (argc=4, argv=0xbf7feba0) at tests/basic/quota.c:83 > > It is awaiting for a write to complete, but we still do not know what process > got the request and not the reply. Do you see any way to tell? We saw similar bt on test process. At that time we took statedump of client process. While we were going through statedump, surprisingly the test program resumed and completed. Can you take statedump of client process? > > -- > Emmanuel Dreyfus > manu@xxxxxxxxxx > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel