Re: strange slowness of ls with 1 newly created file on gfs 1 or 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christopher Barry wrote:
On Tue, 2007-07-10 at 22:23 -0400, Wendy Cheng wrote:
Pavel Stano wrote:

and then run touch on node 1:
serpico# touch /d/0/test

and ls on node 2:
dinorscio:~# time ls /d/0/
test

What have you expected from a cluster filesystem ? When you touch a file on node 1, it is a "create" that requires at least 2 exclusive locks (directory lock and the file lock itself, among many other things). On a local filesystem such as ext3, disk activities are delayed due to filesystem cache where "touch" writes the data into cache and "ls" reads it from cache on the very same node - all memory operations. On cluster filesystem, when you do an "ls" on node 2, node 2 needs to ask node 1 to release the locks (few ping-pong messages between two nodes and lock managers via network), the contents inside node 1's cache need to get synced to the shared storage. After node 2 gets the locks, it has to read contents from the disk.

I hope the above explanation is clear.

and last thing, i try gfs2, but same result


-- Wendy

This seems a little bit odd to me. I'm running a RH 7.3 cluster,
pre-redhat Sistina GFS, lock_gulm, 1GB FC shared disk, and have been
since ~2002.

Here's the timing I get for the same basic test between two nodes:

[root@sbc1 root]# cd /mnt/gfs/workspace/cbarry/
[root@sbc1 cbarry]# mkdir tst
[root@sbc1 cbarry]# cd tst
[root@sbc1 tst]# time touch testfile

real    0m0.094s
user    0m0.000s
sys     0m0.000s
[root@sbc1 tst]# time ls -la testfile
-rw-r--r--    1 root     root            0 Jul 11 12:20 testfile

real    0m0.122s
user    0m0.010s
sys     0m0.000s
[root@sbc1 tst]#

Then immediately from the other node:

[root@sbc2 root]# cd /mnt/gfs/workspace/cbarry/
[root@sbc2 cbarry]# time ls -la tst
total 12
drwxr-xr-x    2 root     root         3864 Jul 11 12:20 .
drwxr-xr-x    4 cbarry   cbarry       3864 Jul 11 12:20 ..
-rw-r--r--    1 root     root            0 Jul 11 12:20 testfile

real    0m0.088s
user    0m0.010s
sys     0m0.000s
[root@sbc2 cbarry]#


Now, you cannot tell me 10 seconds is 'normal' for a clustered fs. That
just does not fly. My guess is DLM is causing problems.

From previous post, we really can't tell since the network and disk speeds are variables and unknown. However, look at your data:

local "ls" is 0.122s
remote "ls" is 0.088s

I bet the disk flushing happened during first "ls" (and different base kernels treat their dirty data flush and IO scheduling differently). I can't be convinced that DLM is an issue - unless the experiment has collected enough sample that has its statistical significance.

-- Wendy


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux