Ceph-Fuse and mount namespaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Cephalopodians,

continuing a bit on the point raised in the other thread ( "CephFS very unstable with many small files" )
concerning the potentially unexpected behaviour of the ceph-fuse client with regard to mount namespaces I did a first small experiment. 

First off: I did not see any bad behaviour which can be traced back to this directly, but maybe it is still worthwhile
to share the information. 

Here's what I did. 

1) Initially, cephfs is mounted fine:
[root@wn001 ~]# ps faux | grep ceph
root        1908 31.4  0.1 1485376 201392 ?      Sl   Feb25 983:26 ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw

2) Now, I fire off a container as normal user:
$ singularity exec -B /cvmfs -B /cephfs /cvmfs/some_container_repository/singularity/SL6/default/1519725973/ bash
Welcome inside the SL6 container.
Singularity> ls /cephfs
benchmark  dd_test_rd.sh  dd_test.sh  grid  kern  port  user
Singularity> cd /cephfs

All is fine and as expected. Singularity is one of many container runtimes, you may also use charliecloud (more lightweight,
and good to learn from the code how things work) or runc (the reference implementation of OCI). 
The following may also work with a clever arrangement of "unshare" calls (see e.g. https://sft.its.cern.ch/jira/projects/CVM/issues/CVM-1478 ). 

3) Now the experiment starts. On the host:
[root@wn001 ~]# umount /cephfs/
[root@wn001 ~]# ps faux | grep ceph
root        1908 31.4  0.1 1485376 201392 ?      Sl   Feb25 983:26 ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
[root@wn001 ~]# ls /cephfs/
[root@wn001 ~]#

=> The CephFS is unmounted, the fuse helper is kept running! 
The reason: It is still in use within the mount namespace in the container. 
But there is no filehandle visible in the host namespace, which is why the umount succeeds and returns. 

4) Now, in the container:
Singularity> ls
benchmark  dd_test_rd.sh  dd_test.sh  grid  kern  port  user

I can also write and read just fine. 

5) Now the ugly part begins. On the host: 
[root@wn001 ~]# mount /cephfs
2018-02-28 00:07:43.431425 7efddc61e040 -1 asok(0x5571340ae1c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.cephfs_baf.asok': (17) File exists
2018-02-28 00:07:43.434597 7efddc61e040 -1 init, newargv = 0x5571340abb20 newargc=11
ceph-fuse[98703]: starting ceph client
ceph-fuse[98703]: starting fuse
[root@wn001 ~]# ps faux | grep ceph
root        1908 31.4  0.1 1485376 201392 ?      Sl   Feb25 983:26 ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
root       98703  1.0  0.0 400268  9456 pts/2    Sl   00:07   0:00 ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw

As you can see:
- Name collision for admin socket, since the helper is already running. 
- A second helper for the same mountpoint was fired up! 
- Of course, now cephfs is accessible on the host again. 
- On a side-note, once I exit the container (and hence close the mount namespace), the "old" helper is finally freed. 

Hence, I am unsure what exactly happens during the internal "remount" when the cephfs_fuse helper remounts the FS to make the kernel drop all internal caches. 

Since my kernel anf FUSE experience is very limited, let me recollect what other Fuse-FSes do:
- sshfs does the same, i.e. one helper in host and one helper in container namespace. But it does not have problems with e.g. the admin socket. 
- CVMFS ( http://cvmfs.readthedocs.io/en/stable/ ) errors out in step (5), i.e. the admin can not remount anymore on the host. 
  This is nasty, especially when combined with autofs and containers are placed on CVMFS, which is why I opened https://sft.its.cern.ch/jira/projects/CVM/issues/CVM-1478 with them. 
  They need to enforce a single helper only to prevent corruption (even though it's a network FS, they have heavy local on-disk caching). 
- ntfs-3g has the only correct behaviour IMHO. 
  I don't know how they pull it off, but when you are in (5) and issue "mount" on the host, no new fuse-helper is started - but the existing fuse helper takes care of both the mount in the host namespace
  and the mount in the container namespace. 
  They also need to do this to prevent corruption, since it's not a network FS. 

I'm unsure if this is really a problem, and I did not yet clearly see if this actually breaks anything with Ceph. For sure I hope the information
is worthwhile and may trigger some ideas. 

Cheers,
	Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux