cp taking 100% cpu and never terminating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Something odd is happening when I run a shell script with cp commands in it. This happens infrequently but I have to reboot the system to get my processor back. I'm never taring or copying more than 50 megs of data.

It either hangs on a command like:
cp --reply=yes /usr/src/linux-${kernver}/.config /tftpboot/node_root/boot/config-${kernver}
or
tar cf - etc | gzip > /tftpboot/node_root/drbl_ssi/template_etc.tgz

when I do a top I see:
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1603 root      20   0 54160 1616  508 R  100  0.0  33:02.72 cp
(100% cpu time)

I'm unable to kill that process in any way, but I can kill the shell script that spawned it. The CP command is still running.

I see the below errors on the client:
2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush] system1: : returning EBADFD 2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1: (path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1 op_errno=77 2008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system1: no valid fd found, returning 2008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system-ns1: no valid fd found, returning

My client and server specs are identical to:
http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3

This happens equally over ib-verbs and tcp transports.

--




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux