On Tue, Mar 22, 2016 at 9:13 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Tue, Mar 22, 2016 at 2:57 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >>>>> I don't like the idea of sharing namespace name strings between libceph >>>>> and ceph modules, especially with the strings themselves hosted in >>>>> libceph. rbd has no use for namespaces, so libceph can live with >>>>> namespace names embedded into ceph_osd_request by value or with >>>>> a simple non-owning pointer, leaving reference counting to the outside >>>>> modules, if one of the use cases is "one namespace with a long name for >>>>> the entire directory tree" or something close to it. >>>>> >>>>> I think the sharing infrastructure should be moved into cephfs, and >>>>> probably changed to share entire file layouts along the way. I don't >>>>> know this code well enough to be sure, but it seems that by sharing >>>>> file layouts and making ci->i_layout an owning pointer you might be >>>>> able to piggy back on i_ceph_lock and considerably simplify the whole >>>>> thing by dropping RCU and eliminating numerous get/put calls. >>>> >>>> RBD may use namespace later. >>>> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_namespace_support >>> >>> Well, compared to cephfs, it's hard to call that "using" - in that >>> case, there will only ever be one namespace per image. My point is >>> it's never going to use the string sharing infrastructure and is fine >>> with a non-owning pointer to a string in the file layout field of the >>> in-memory rbd image header. >>> >>>> >>>> The reason I use RCU here is that ci->i_layout.pool_ns can change at >>>> any time. For the same reason, using non-owning pointer for namespace >>>> or entire layout is unfeasible. Using embedded namespace is not >>>> elegant either. When allocating ceph_osd_request, cephfs needs to >>>> lock i_ceph_lock, copy namespace to a temporary buffer, unlock >>>> i_ceph_lock, pass ci->i_layout and the temporary buffer to the >>>> ceph_osdc_xxx_request(). >>> >>> Hmm, RCU doesn't protect you from pool_ns or other file layout fields >>> changing while the OSD request is in flight. As used above, it allows >>> ceph_try_get_string() to not take any locks and that's it. >> >> Yes. But not taking lock simplify the code a lot. we don't need to >> lock/unlock i_ceph_lock each time i_layout is used. > > I wouldn't call a bunch of rcu_dereference_* variants sprinkled around > the code base a simplification, but, more importantly, is keeping the > pool_ns pointer valid really all you need? Shouldn't there be some > kind of synchronization around "OK, I'm switching to a new layout for > this inode"? As it is, pool_ns is grabbed in ceph_osdc_new_request(), > with two successive calls to ceph_osdc_new_request() potentially ending > up with two different namespaces, e.g. ceph_uninline_data(). I haven't looked at the code at all, but this part is really confusing me. Namespaces can't be changed arbitrarily, but when they are, a subsequent write really needs to use the updated namespace! It needs an atomic switch, not a lazy update. Is there some point at which out-of-date users get updated prior to actually sending off OSD ops? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html