Re: successive bonnie++ tests taking longer and longer to run (system load steadily increasing)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Raghavendra G wrote:
* What does the top output corresponding to glusterfs say? what is the memory usage and cpu usage? * Do you find anything interesting in glusterfs client log files? Can we get the log files?

Snapshot :

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1185 root      20   0 2478m 909m  564 D 11.3 90.1   1064:50 glusterfs


Over a monitored period of 30 minutes, the CPU usage hovered between 7 and 12, and Mem usage hovered at 90 to 91.



================================================================================
Version      : glusterfs 3.0.0 built on Feb  3 2010 14:39:23
git: 2.0.1-886-g8379edd
Starting Time: 2010-02-17 13:58:41
Command line : /usr/sbin/glusterfs --log-level=NORMAL --volfile=/etc/glusterfs/replicated-tcp.vol /opt/gluster
PID          : 1185
System name  : Linux
Nodename     : 169.install.pxe
Kernel Release : 2.6.26.8-57.fc8
Hardware Identifier: i686

Given volfile:
+------------------------------------------------------------------------------+
  1: ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
  2: # Cmd line:
3: # $ /usr/bin/glusterfs-volgen --name replicated --raid 1 s01:/opt/gluster s02:/opt/gluster
  4:
  5: # RAID 1
  6: # TRANSPORT-TYPE tcp
  7: volume s01-1
  8:     type protocol/client
  9:     option transport-type tcp
 10:     option remote-host s01
 11:     option transport.socket.nodelay on
 12:     option transport.remote-port 6996
 13:     option remote-subvolume brick1
 14: end-volume
 15:
 16: volume s02-1
 17:     type protocol/client
 18:     option transport-type tcp
 19:     option remote-host s02
 20:     option transport.socket.nodelay on
 21:     option transport.remote-port 6996
 22:     option remote-subvolume brick1
 23: end-volume
 24:
 25: volume mirror-0
 26:     type cluster/replicate
 27:     subvolumes s01-1 s02-1
 28: end-volume
 29:
 30: volume writebehind
 31:     type performance/write-behind
 32:     option cache-size 4MB
 33:     subvolumes mirror-0
 34: end-volume
 35:
 36: volume readahead
 37:     type performance/read-ahead
 38:     option page-count 4
 39:     subvolumes writebehind
 40: end-volume
 41:
 42: volume iocache
 43:     type performance/io-cache
 44:     option cache-size 1GB
 45:     option cache-timeout 1
 46:     subvolumes readahead
 47: end-volume
 48:
 49: volume quickread
 50:     type performance/quick-read
 51:     option cache-timeout 1
 52:     option max-file-size 64kB
 53:     subvolumes iocache
 54: end-volume
 55:
 56: volume statprefetch
 57:     type performance/stat-prefetch
 58:     subvolumes quickread
 59: end-volume
 60:

+------------------------------------------------------------------------------+
[2010-02-17 13:58:41] W [xlator.c:655:validate_xlator_volume_options] s02-1: option 'transport.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-02-17 13:58:41] W [xlator.c:655:validate_xlator_volume_options] s01-1: option 'transport.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-02-17 13:58:52] E [socket.c:760:socket_connect_finish] s01-1: connection to failed (Connection refused) [2010-02-17 13:58:52] E [socket.c:760:socket_connect_finish] s01-1: connection to failed (Connection refused) [2010-02-17 13:58:58] E [socket.c:760:socket_connect_finish] s02-1: connection to failed (No route to host) [2010-02-17 13:58:58] E [socket.c:760:socket_connect_finish] s02-1: connection to failed (No route to host) [2010-02-17 14:06:23] N [client-protocol.c:6224:client_setvolume_cbk] s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'. [2010-02-17 14:06:23] N [afr.c:2625:notify] mirror-0: Subvolume 's01-1' came back up; going online. [2010-02-17 14:06:23] N [client-protocol.c:6224:client_setvolume_cbk] s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'. [2010-02-17 14:06:23] N [afr.c:2625:notify] mirror-0: Subvolume 's01-1' came back up; going online. [2010-02-17 14:06:33] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-17 14:06:33] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-20 20:02:43] E [client-protocol.c:415:client_ping_timer_expired] s02-1: Server 10.0.0.39:6996 has not responded in the last 42 seconds, disconnecting. [2010-02-20 20:02:47] E [saved-frames.c:165:saved_frames_unwind] s02-1: forced unwinding frame type(2) op(PING)
[2010-02-20 20:02:48] N [client-protocol.c:6972:notify] s02-1: disconnected
[2010-02-20 20:04:01] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-20 20:04:01] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-22 02:37:28] E [client-protocol.c:415:client_ping_timer_expired] s02-1: Server 10.0.0.39:6996 has not responded in the last 42 seconds, disconnecting. [2010-02-22 02:37:56] E [saved-frames.c:165:saved_frames_unwind] s02-1: forced unwinding frame type(1) op(READ) [2010-02-22 02:37:56] E [saved-frames.c:165:saved_frames_unwind] s02-1: forced unwinding frame type(2) op(PING)
[2010-02-22 02:37:56] N [client-protocol.c:6972:notify] s02-1: disconnected
[2010-02-22 02:38:02] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-22 02:38:02] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-22 04:19:02] E [client-protocol.c:415:client_ping_timer_expired] s02-1: Server 10.0.0.39:6996 has not responded in the last 42 seconds, disconnecting. [2010-02-22 04:19:50] E [saved-frames.c:165:saved_frames_unwind] s02-1: forced unwinding frame type(1) op(READ) [2010-02-22 04:19:50] E [saved-frames.c:165:saved_frames_unwind] s02-1: forced unwinding frame type(2) op(PING)
[2010-02-22 04:19:51] N [client-protocol.c:6972:notify] s02-1: disconnected
[2010-02-22 04:20:33] E [client-protocol.c:415:client_ping_timer_expired] s01-1: Server 10.0.0.38:6996 has not responded in the last 42 seconds, disconnecting. [2010-02-22 04:20:34] E [saved-frames.c:165:saved_frames_unwind] s01-1: forced unwinding frame type(1) op(READ) [2010-02-22 04:20:41] E [saved-frames.c:165:saved_frames_unwind] s01-1: forced unwinding frame type(2) op(PING) [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
[2010-02-22 04:20:41] N [client-protocol.c:6972:notify] s01-1: disconnected
[2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk] s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'. [2010-02-22 04:20:41] N [afr.c:2625:notify] mirror-0: Subvolume 's02-1' came back up; going online. [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk] s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'. [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk] s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.


That is the entire client log file. If it would be useful, i can stop the tests, put the Gluster logging into the debug level, and start again. Just let me know.


--
Daniel Maher <dma+gluster AT witbe DOT net>




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux