Re: Re: cp taking 100% cpu and never terminating

Mickey Mazarick <mic@xxxxxxxxxxxxxxxxxx> · Tue, 17 Jun 2008 15:54:29 -0400

Thanks for your help on this. We have been seeing it more frequently 
over the tcp interface than we had previously. Also the server and the 
client are running on the same system for these tests. Additionally the 
current system is running kernel 2.6.24 compiled for x86_64

For the client I see:
(gdb) bt
#0  0x0000003b28fc74ec in epoll_wait () from /lib64/tls/libc.so.6
#1  0x00002b7de22c12ce in sys_epoll_iteration (ctx=0x505010) at epoll.c:188
#2  0x00002b7de22c0a6a in poll_iteration (ctx=0x7) at transport.c:312
#3  0x0000000000402658 in main (argc=-499327176, argv=0x505010) at 
glusterfs.c:564

For the cp process, gdb is unable to attach. It hangs at "Attaching to 
process 6073"

After I kill the gdb process it reports:
linux-nat.c:1072: internal-error: linux_nat_detach: Assertion `num_lwps 
== 1' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

Email me seperatly if you would like access to the machine. The problem 
repeats most of the time when I run a specific script so I can let you 
see for yourself.

Thanks!
-Mickey Mazarick

Raghavendra G wrote:
Hi Mickey,
Is it possible to attach to glusterfs process using gdb, while cp is 
hung and get a backtrace?
#ps aux | grep -i glusterfs
# gdb -p <glusterfs-process-id>
and in gdb,
gdb> bt

Also,
It would be helpful, If you can get a backtrace of cp also.
#ps aux | grep -i cp
# gdb -p <cp-process-id>
gdb> bt

Also, I am curious to know what do the --reply option to cp does.

regards,

On Sun, Jun 15, 2008 at 12:12 AM, Mickey Mazarick 
<mic@xxxxxxxxxxxxxxxxxx <mailto:mic@xxxxxxxxxxxxxxxxxx>> wrote:

    I'm still seeing the problem described below. It only happens over
    the ibverbs transport and very infrequently tcp. This is an
    intermittent problem, but happens quite frequently over ibverbs.
    It will use all the processing power on a single core of the
    client machine. I can repeat the command but eventually the
    machine will lock with all processors doing a cp or a tar command.
    We see it on both kernel 2.6.18 and 2.6.24. <http://2.6.24.> Has
    anyone there been able to replicate it?

    Thanks!
    -Mickey Mazarick

    Mickey Mazarick wrote:

        Something odd is happening when I run a shell script with cp
        commands in it. This happens infrequently but I have to reboot
        the system to get my processor back. I'm never taring or
        copying more than 50 megs of data.

        It either hangs on a command like:
        cp --reply=yes /usr/src/linux-${kernver}/.config
        /tftpboot/node_root/boot/config-${kernver}
        or
        tar cf - etc | gzip >
        /tftpboot/node_root/drbl_ssi/template_etc.tgz

        when I do a top I see:
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
         COMMAND
        1603 root      20   0 54160 1616  508 R  100  0.0  33:02.72 cp
        (100% cpu time)

        I'm unable to kill that process in any way, but I can kill the
        shell script that spawned it. The CP command is still running.

        I see the below errors on the client:
        2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush]
        system1: : returning EBADFD
        2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1:
        (path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1
        op_errno=77
        2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]
        system1: no valid fd found, returning
        2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]
        system-ns1: no valid fd found, returning

        My client and server specs are identical to:
        http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3

        This happens equally over ib-verbs and tcp transports.

    -- 

    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
    http://lists.nongnu.org/mailman/listinfo/gluster-devel

--
Raghavendra G

A centipede was happy quite, until a toad in fun,
Said, "Prey, which leg comes after which?",
This raised his doubts to such a pitch,
He fell flat into the ditch,
Not knowing how to run.
-Anonymous 

--