Hello, Thanks to Jcsp (John Spray I guess) that helps me on IRC. On 06/03/2015 04:04, Francois Lafont wrote: >> ~# mkdir /cephfs >> ~# mount -t ceph 10.0.2.150,10.0.2.151,10.0.2.152:/ /cephfs/ -o name=cephfs,secretfile=/etc/ceph/ceph.client.cephfs.secret >> >> Then in ceph-testfs, I do: >> >> root@test-cephfs:~# mkdir /cephfs/d1 >> root@test-cephfs:~# ll /cephfs/ >> total 4 >> drwxr-xr-x 1 root root 0 Mar 4 11:45 ./ >> drwxr-xr-x 24 root root 4096 Mar 4 11:42 ../ >> drwxr-xr-x 1 root root 0 Mar 4 11:45 d1/ >> >> After, in test-cephfs2, I do: >> >> root@test-cephfs2:~# ll /cephfs/ >> total 4 >> drwxr-xr-x 1 root root 0 Mar 4 11:45 ./ >> drwxr-xr-x 24 root root 4096 Mar 4 11:42 ../ >> drwxrwxrwx 1 root root 0 Mar 4 11:45 d1/ >> >> 1) Why the unix rights of d1/ are different when I'm in test-cephfs >> and when I'm in test-cephfs2? It should be the same, isn't it? In fact, this problem is (maybe) a bug in the Linux kernel ceph client version "3.16". Indeed, if I mount the cephfs with ceph-fuse on the 2 client nodes, the problem doesn't happen and if I mount the cephfs with "mount.ceph" but with the Linux kernel 3.13, the problem doesn't happen too. I have made a bug report here : http://tracker.ceph.com/issues/11059 >> 2) If I create 100 files in /cephfs/d1/ with test-cephfs: >> >> for i in $(seq 100) >> do >> echo "$(date +%s.%N)" >/cephfs/d1/f_$i >> done >> >> sometimes, in test-cephfs2, when I do a simple: >> >> root@test-cephfs2:~# time \ls -la /cephfs > > Sorry error of copy and paste, of course it was: > > root@test-cephfs2:~# time \ls -la /cephfs/d1/ > >> the command can take 2 or 3 seconds which seems to me very long >> for a directory with just 100 files. Generally, if I repeat the >> command on test-cephfs2 just after, it's immediate but not always. >> I can not reproduce the problem in a determinist way. Sometimes, >> to reproduce the problem, I must remove all the files in /cephfs/ >> on test-cepfs and recreate them. It's very strange. Sometimes and >> randomly, something seems to be stalled but I don't know what. I >> suspect a problem of mds tuning but, In fact, I don't know what >> to do. > > I have the same problem with hammer too. > But someone can confirm me that 3s (not always) for "ls -la" in > a cephfs directory which contains 100 file it's pathological? After > all, maybe is it normal? I don't have much experience with cephfs. In fact, according to what I was told on IRC, a such time for the "ls -la" command could be normal because the client node requests a "stat" for each file in the directory and each "stat" can take a little time. But I'm still a little puzzled. Indeed the first "ls -la" can take 2 or 3 seconds and the next "ls -la" is usually faster but not always. Sometimes (it's very random), the second "ls -la", even the third "ls -la" etc. can be very slow. I admit that after a number of attempts, "ls -la" becomes faster but not always from the second attempt. I'm still surprised by such times. For instance, It seems to me that, with a mounted nfs share, commands like "ls -la" are very fast in comparison (with a directory which contains the same number of files). Can anyone explain to me why there is a such difference between the nfs case and the cephfs case? This is absolutely not a criticism but it's just to understand the concepts that come into play. In the case of "ls -al" ie just reading (it is assumed that there is no writing on the directory), the nfs and the cephfs cases seem to me very similar: the client just requests a stat on each file in the directory. Am I wrong? -- François Lafont _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com