Re: Node lag

Frank Schliefer <f_schliefer@xxxxxx> · Mon, 13 Feb 2006 11:04:02 +0100

Hi,

yeah your right, wrong file size:

Here are the test results with a 2048MB file size. The Raid itself holds 
1024MB Cache in RAM.

# tiotest -f 2048
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8192 MBs |   58.6 s | 139.801 MB/s |  28.0 %  | 987.1 % |
| Random Write   16 MBs |    1.9 s |   8.435 MB/s |   0.9 %  |  29.6 % |
| Read         8192 MBs |   54.9 s | 149.176 MB/s |  16.6 %  | 171.7 % |
| Random Read    16 MBs |   10.4 s |   1.509 MB/s |   0.2 %  |   4.1 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        0.108 ms |      980.220 ms |  0.00000 |   0.00000 |
| Random Write |        1.237 ms |      198.483 ms |  0.00000 |   0.00000 |
| Read         |        0.104 ms |      185.499 ms |  0.00000 |   0.00000 |
| Random Read  |       10.178 ms |      116.995 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        0.117 ms |      980.220 ms |  0.00000 |   0.00000 |
`--------------+-----------------+-----------------+----------+-----------'

So the IO is mainly the same on all nodes with the right file size, but 
the question still is why is the random read/write performance so bad !!

More Infos about the Systems as:
Each Node got 2048MB RAM and dual Xeon CPU.
As FC Controller we are using are QLogic Corp. QLA2312
As a Switch and for fencing the Qlogic 5202.
The Raid itself is an easyRAID Q16+ with 16 Disk and it performance very 
well under eg XFS.

Any further hints ?

--
----

Frank Schliefer

Kovacs, Corey J. schrieb:
Also, I think it might be interesting to see what happens when you use data
sizes that
will overrun any cacheing being done. I've seen great performance using a
simple MSA1000
as long as there is a lot of cache available on the SAN itself. As soon as I
run tests with
data sets larger then the cache size, the performance falls to the floor.
Unless your over
loading the cache, you might not be getting a true metric of whats really
getting written 
to disk.

Maybe the slow node is getting hit by cache overhead from the SAN? 

Just a thought

Corey

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Patrick Caulfield
Sent: Thursday, February 09, 2006 9:18 AM
To: linux clustering
Subject: Re:  Node lag

Frank Schliefer wrote:

Hi,

after setting up an four node cluster we have one node that is way 
slower than the other 3 nodes.

We using eg. tiotest for benchmarking the GFS.

Normal Node:
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 % |
| Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 % |
| Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0 % |
| Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9 % |
`----------------------------------------------------------------------'

Slow Node:
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 % |
| Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 % |
| Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1 % |
| Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1 % |
`----------------------------------------------------------------------'

any hints why this could happen ??

Using kernel 2.6.15.2 (sorry no RH)

It would be helpful if you could give us more information about your
installation: disk topology, lock manager in use (and which nodes are
lockservers if using GULM) and whether it matters which nodes are started
first or not.

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster