Hi Mickey, Is it possible to attach to glusterfs process using gdb, while cp is hung and get a backtrace? #ps aux | grep -i glusterfs # gdb -p <glusterfs-process-id> and in gdb, gdb> bt Also, It would be helpful, If you can get a backtrace of cp also. #ps aux | grep -i cp # gdb -p <cp-process-id> gdb> bt Also, I am curious to know what do the --reply option to cp does. regards, On Sun, Jun 15, 2008 at 12:12 AM, Mickey Mazarick <mic@xxxxxxxxxxxxxxxxxx> wrote: > I'm still seeing the problem described below. It only happens over the > ibverbs transport and very infrequently tcp. This is an intermittent > problem, but happens quite frequently over ibverbs. It will use all the > processing power on a single core of the client machine. I can repeat the > command but eventually the machine will lock with all processors doing a cp > or a tar command. We see it on both kernel 2.6.18 and 2.6.24. Has anyone > there been able to replicate it? > > Thanks! > -Mickey Mazarick > > > > Mickey Mazarick wrote: > >> Something odd is happening when I run a shell script with cp commands in >> it. This happens infrequently but I have to reboot the system to get my >> processor back. I'm never taring or copying more than 50 megs of data. >> >> It either hangs on a command like: >> cp --reply=yes /usr/src/linux-${kernver}/.config >> /tftpboot/node_root/boot/config-${kernver} >> or >> tar cf - etc | gzip > /tftpboot/node_root/drbl_ssi/template_etc.tgz >> >> when I do a top I see: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 1603 root 20 0 54160 1616 508 R 100 0.0 33:02.72 cp >> (100% cpu time) >> >> I'm unable to kill that process in any way, but I can kill the shell >> script that spawned it. The CP command is still running. >> >> I see the below errors on the client: >> 2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush] system1: : >> returning EBADFD >> 2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1: >> (path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1 op_errno=77 >> 2008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system1: no >> valid fd found, returning >> 2008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system-ns1: no >> valid fd found, returning >> >> My client and server specs are identical to: >> >> http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3 >> >> This happens equally over ib-verbs and tcp transports. >> >> > > -- > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- Raghavendra G A centipede was happy quite, until a toad in fun, Said, "Prey, which leg comes after which?", This raised his doubts to such a pitch, He fell flat into the ditch, Not knowing how to run. -Anonymous