Re: [PATCH v2 2/6] libceph: introduce reference counted string

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 22 Mar 2016 11:46:02 -0700



On Tue, Mar 22, 2016 at 9:13 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Tue, Mar 22, 2016 at 2:57 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>>>>> I don't like the idea of sharing namespace name strings between libceph
>>>>> and ceph modules, especially with the strings themselves hosted in
>>>>> libceph.  rbd has no use for namespaces, so libceph can live with
>>>>> namespace names embedded into ceph_osd_request by value or with
>>>>> a simple non-owning pointer, leaving reference counting to the outside
>>>>> modules, if one of the use cases is "one namespace with a long name for
>>>>> the entire directory tree" or something close to it.
>>>>>
>>>>> I think the sharing infrastructure should be moved into cephfs, and
>>>>> probably changed to share entire file layouts along the way.  I don't
>>>>> know this code well enough to be sure, but it seems that by sharing
>>>>> file layouts and making ci->i_layout an owning pointer you might be
>>>>> able to piggy back on i_ceph_lock and considerably simplify the whole
>>>>> thing by dropping RCU and eliminating numerous get/put calls.
>>>>
>>>> RBD may use namespace later.
>>>> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_namespace_support
>>>
>>> Well, compared to cephfs, it's hard to call that "using" - in that
>>> case, there will only ever be one namespace per image.  My point is
>>> it's never going to use the string sharing infrastructure and is fine
>>> with a non-owning pointer to a string in the file layout field of the
>>> in-memory rbd image header.
>>>
>>>>
>>>> The reason I use RCU here is that ci->i_layout.pool_ns can change at
>>>> any time. For the same reason, using non-owning pointer for namespace
>>>> or entire layout is unfeasible. Using embedded namespace is not
>>>> elegant either. When allocating ceph_osd_request, cephfs needs to
>>>> lock i_ceph_lock, copy namespace to a temporary buffer, unlock
>>>> i_ceph_lock, pass ci->i_layout and the temporary buffer to the
>>>> ceph_osdc_xxx_request().
>>>
>>> Hmm, RCU doesn't protect you from pool_ns or other file layout fields
>>> changing while the OSD request is in flight.  As used above, it allows
>>> ceph_try_get_string() to not take any locks and that's it.
>>
>> Yes.  But not taking lock simplify the code a lot.  we don't need to
>> lock/unlock i_ceph_lock each time i_layout is used.
>
> I wouldn't call a bunch of rcu_dereference_* variants sprinkled around
> the code base a simplification, but, more importantly, is keeping the
> pool_ns pointer valid really all you need?  Shouldn't there be some
> kind of synchronization around "OK, I'm switching to a new layout for
> this inode"?  As it is, pool_ns is grabbed in ceph_osdc_new_request(),
> with two successive calls to ceph_osdc_new_request() potentially ending
> up with two different namespaces, e.g. ceph_uninline_data().

I haven't looked at the code at all, but this part is really confusing
me. Namespaces can't be changed arbitrarily, but when they are, a
subsequent write really needs to use the updated namespace! It needs
an atomic switch, not a lazy update. Is there some point at which
out-of-date users get updated prior to actually sending off OSD ops?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html