Re: cp taking 100% cpu and never terminating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Heh yes sorry on the server side I'm seeing errors like:
2008-05-11 17:02:22 E [posix.c:1982:posix_setdents] system-ns: Error creating file /mnt/gluster/system-ns/scripts/drbl/drblupdateusr.sh with mode (0100755) 2008-05-11 17:02:22 E [posix.c:1982:posix_setdents] system-ns: Error creating file /mnt/gluster/system-ns/scripts/drbl/drblrebu.swp with mode (0100644) 2008-05-11 17:02:22 E [posix.c:1982:posix_setdents] system-ns: Error creating file /mnt/gluster/system-ns/scripts/drbl/getexefiles.sh with mode (0100755) 2008-05-11 17:39:33 E [posix.c:1990:posix_setdents] system-ns: error creating symlink /mnt/gluster/system-ns/usr/lib64/perl5/5.8.2/x86_64-linux-thread-multi/CORE/libperl.so 2008-05-11 17:39:44 E [posix.c:1990:posix_setdents] system-ns: error creating symlink /mnt/gluster/system-ns/usr/lib64/perl5/5.8.1/x86_64-linux-thread-multi/CORE/libperl.so 2008-05-11 18:48:32 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.1.204:1013) 2008-05-11 18:48:32 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.1.204:1015)
.
The times don't correspond to the errors on the client. This is from the storage brick "system1" mentioned in the client logs below.

Thanks!
-Mickey Mazarick


Raghavendra G wrote:
Hi Mickey,
Is it possible to provide server side logs?

regards,

On Mon, May 12, 2008 at 1:43 AM, Mickey Mazarick <mic@xxxxxxxxxxxxxxxxxx <mailto:mic@xxxxxxxxxxxxxxxxxx>> wrote:

    Something odd is happening when I run a shell script with cp
    commands in it. This happens infrequently but I have to reboot the
    system to get my processor back. I'm never taring or copying more
    than 50 megs of data.

    It either hangs on a command like:
    cp --reply=yes /usr/src/linux-${kernver}/.config
    /tftpboot/node_root/boot/config-${kernver}
    or
    tar cf - etc | gzip > /tftpboot/node_root/drbl_ssi/template_etc.tgz

    when I do a top I see:
     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1603 root      20   0 54160 1616  508 R  100  0.0  33:02.72 cp
    (100% cpu time)

    I'm unable to kill that process in any way, but I can kill the
    shell script that spawned it. The CP command is still running.

    I see the below errors on the client:
    2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush]
    system1: : returning EBADFD
    2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1:
    (path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1 op_errno=77
    2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]
    system1: no valid fd found, returning
    2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]
    system-ns1: no valid fd found, returning

    My client and server specs are identical to:
    http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3

    This happens equally over ib-verbs and tcp transports.

--

    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
    http://lists.nongnu.org/mailman/listinfo/gluster-devel




--
Raghavendra G

A centipede was happy quite, until a toad in fun,
Said, "Prey, which leg comes after which?",
This raised his doubts to such a pitch,
He fell flat into the ditch,
Not knowing how to run.
-Anonymous


--




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux