Firefly, cephfs issues: different unix rights depending on the client and ls are slow

Francois Lafont <flafdivers@xxxxxxx> · Wed, 04 Mar 2015 14:15:14 +0100

Hi,

I'm trying cepfs and I have some problems. Here is the context:

All the nodes (in cluster and the clients) are Ubuntu 14.04 with a 3.16
kernel (after apt-get install linux-generic-lts-utopic && reboot).

The cluster:
- one server with just one monitor daemon (RAM 2GB)
- 2 servers (RAM 24GB) with one monitor daemon, ~10 OSDs daemon (one
  per disk of 275 GB), and one mds daemon (I use the default
  active/standby mode and the pools for cephfs are "data" and "metadata")

The cluster is totally unused (the servers are idle as regards the RAM
and the load overage etc), it's a little cluster for testing, the raw
space is 5172G, number of replicas is 2. Another remark, facing my problem,
I have put in my ceph conf "mds cache size = 1000000" but without lof of
effect (or else I would not be posting this message). Initially, the
cephfs is completely empty.

The clients, "test-cephfs" and "test-cephfs2", have 512MB of RAM. In these
clients, I mount the cephfs like this (with the root account):

~# mkdir /cephfs
~# mount -t ceph 10.0.2.150,10.0.2.151,10.0.2.152:/ /cephfs/ -o name=cephfs,secretfile=/etc/ceph/ceph.client.cephfs.secret

Then in ceph-testfs, I do:

root@test-cephfs:~# mkdir /cephfs/d1
root@test-cephfs:~# ll /cephfs/
total 4
drwxr-xr-x  1 root root    0 Mar  4 11:45 ./
drwxr-xr-x 24 root root 4096 Mar  4 11:42 ../
drwxr-xr-x  1 root root    0 Mar  4 11:45 d1/

After, in test-cephfs2, I do:

root@test-cephfs2:~# ll /cephfs/
total 4
drwxr-xr-x  1 root root    0 Mar  4 11:45 ./
drwxr-xr-x 24 root root 4096 Mar  4 11:42 ../
drwxrwxrwx  1 root root    0 Mar  4 11:45 d1/

1) Why the unix rights of d1/ are different when I'm in test-cephfs
and when I'm in test-cephfs2? It should be the same, isn't?

2) If I create 100 files in /cephfs/d1/ with test-cephfs:

for i in $(seq 100)
do
    echo "$(date +%s.%N)" >/cephfs/d1/f_$i
done

sometimes, in test-cephfs2, when I do a simple:

root@test-cephfs2:~# time \ls -la /cephfs

the command can take 2 or 3 seconds which seems to me very long
for a directory with just 100 files. Generally, if I repeat the
command on test-cephfs2 just after, it's immediate but not always.
I can not reproduce the problem in a determinist way. Sometimes,
to reproduce the problem, I must remove all the files in /cephfs/
on test-cepfs and recreate them. It's very strange. Sometimes and
randomly, something seems to be stalled but I don't know what. I
suspect a problem of mds tuning but, In fact, I don't know what
to do.

Do have an idea of the problem?

3) I plan to use cephfs in production in a project of web servers
(which share together a cephfs storage) but I would like to solve
the issue above before.

If you have any suggestion about cephfs and mds tuning, I am highly
interested.

Thanks in advance for your help.

-- 
François Lafont
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com