> On Mar 22, 2016, at 19:00, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > On Tue, Mar 22, 2016 at 10:17 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >> >>> On Mar 22, 2016, at 14:05, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > [ snip ] > >>> Hi Zheng, >>> >>> A one and a half line commit message and an equally short cover letter >>> for a series such as this isn't enough. I *happen* to know that the >>> basic use case for namespaces in cephfs is going to be restricting >>> users to different parts of the directory tree, with the enforcement >>> happening in ceph on the server side, as opposed to in ceph on the >>> client side, but I would appreciate some details on what the actual >>> namespace names are going to be, whether it's user input or not, >>> whether there are plans to use namespaces for anything else, etc. >>> >> >> The namespace restriction you mentioned is for cephfs metadata. This >> namespace is restricting users to different namespaces in cephfs data >> pool. (At present, the only way to restrict data access is, creating >> multiple cephfs data pools, restrict users to different data pool. >> Creating lost of pools is not efficient for the cluster) > > What about the namespace names, who is generating them, how long are > they going to be? Please describe in detail how this is going to work > for both data and metadata pools. For example, to restrict user foo to directory /foo_dir // create auth caps for user foo. ceph auth get-or-create client.foo mon 'allow r' mds 'allow r, allow rw path=/foo_dir' osd 'allow rw pool=data namespace=foo_ns’ // mount cephfs by user admin mount -t ceph 10.0.0.1:/ /mnt/ceph_mount -o name=admin,secret=xxxx // set directory’s layout.pool_namespace setfattr -n ceph.dir.pool_namespace -v foo_ns /mnt/ceph_mount/foo_dir Admin user chooses namespace name. In most cases, namespace name does not change. > >> >> >>> I don't like the idea of sharing namespace name strings between libceph >>> and ceph modules, especially with the strings themselves hosted in >>> libceph. rbd has no use for namespaces, so libceph can live with >>> namespace names embedded into ceph_osd_request by value or with >>> a simple non-owning pointer, leaving reference counting to the outside >>> modules, if one of the use cases is "one namespace with a long name for >>> the entire directory tree" or something close to it. >>> >>> I think the sharing infrastructure should be moved into cephfs, and >>> probably changed to share entire file layouts along the way. I don't >>> know this code well enough to be sure, but it seems that by sharing >>> file layouts and making ci->i_layout an owning pointer you might be >>> able to piggy back on i_ceph_lock and considerably simplify the whole >>> thing by dropping RCU and eliminating numerous get/put calls. >> >> RBD may use namespace later. >> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_namespace_support > > Well, compared to cephfs, it's hard to call that "using" - in that > case, there will only ever be one namespace per image. My point is > it's never going to use the string sharing infrastructure and is fine > with a non-owning pointer to a string in the file layout field of the > in-memory rbd image header. > >> >> The reason I use RCU here is that ci->i_layout.pool_ns can change at >> any time. For the same reason, using non-owning pointer for namespace >> or entire layout is unfeasible. Using embedded namespace is not >> elegant either. When allocating ceph_osd_request, cephfs needs to >> lock i_ceph_lock, copy namespace to a temporary buffer, unlock >> i_ceph_lock, pass ci->i_layout and the temporary buffer to the >> ceph_osdc_xxx_request(). > > Hmm, RCU doesn't protect you from pool_ns or other file layout fields > changing while the OSD request is in flight. As used above, it allows > ceph_try_get_string() to not take any locks and that's it. Yes. But not taking lock simplify the code a lot. we don't need to lock/unlock i_ceph_lock each time i_layout is used. > > Why exactly can't file layouts be shared? ci->i_layout would become > reference counted and you would give libceph a pointer to the pool_ns > string (or entire layout) after bumping it. It doesn't matter if > pool_ns or the rest of the layout changes due to a cap grant or revoke > while libceph is servicing the OSD request: you would unlink it from > the tree but the bumped reference will keep the layout around, to be > put in the OSD request completion callback or so. Layout lookup would > have to happen in exactly two places: when newing an inode and handling > cap grant/revoke, in other places you would simply bump the count on > the basis of already holding a valid pointer. You wouldn't have to > unlink in the destructor, so no hassle with kref_get_unless_zero() and > no need for RCU, with i_ceph_lock providing the exclusion around the > tree. This means cephfs needs to set r_callback for all ceph_osd_request, ceph_osd_request also needs a new field to store layout pointer. I don’t think it’s better/simpler than reference counted namespace string. > > Like I said, I'm not familiar with the code in question at the > necessary level of detail, so feel free to destroy this, but I don't > see a problem. > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html