Re: OSD-Based Object Stubs

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 9 Jun 2015 16:58:46 -0700

On Thu, May 28, 2015 at 3:01 AM, Marcel Lauhoff <ml@xxxxxxxx> wrote:
>
> Gregory Farnum <greg@xxxxxxxxxxx> writes:
>
>> On Wed, May 27, 2015 at 1:39 AM, Marcel Lauhoff <ml@xxxxxxxx> wrote:
>>> Hi,
>>>
>>> I wrote a prototype for an OSD-based object stub feature. An object stub
>>> being an object with it's data moved /elsewhere/. I hope to get some
>>> feedback, especially whether I'm on the right path here and if it
>>> is a feature you are interested in.
>>>
>>>
>>>
>>> Code is in my "osd-stubs" branch:
>>>  https://github.com/ceph/ceph/compare/master...irq0:osd-stubs
>>>  https://github.com/irq0/ceph/tree/osd-stubs
>>>
>>>  Tools to toy around with osd-stubs + web server to send stubs to:
>>>  https://github.com/irq0/ceph_osd-stub_tools
>>>
>>>
>>>
>>> Related:
>>> - https://wiki.ceph.com/Planning/Blueprints/%3CSIDEBOARD%3E/osd:_tiering:_object_redirects
>>
>> Do you have a shorter summary than the code of how these stub and
>> unstub operations relate to the object redirects? We didn't make a
>> great deal of use of them but the basic data structures are mostly
>> present in the codebase, are interpreted in at least some of the right
>> places, and were definitely intended to cover this kind of use case.
>> :)
>> -Greg
>
>
> As far as I understood the redirect feature it is about pointing to
> other objects inside the Ceph cluster. The stubs feature allows
> pointing to anything. An HTTP server in concept code.
>
> Then stubs use an IMHO simpler approach to getting objects back: It's
> the task of the OSD. Stubbed objects just take longer to access, due to
> unstubbing it first.
> Redirects on the other hand leave this to the client: Object redirected
> -> Tell client to retrieve it elsewhere.

Ah, of course.

I got a chance to look at this briefly today. Some notes:

* You're using synchronous reads. That will prevent use of stubbing on
EC pools (which only do async reads, as they might need to hit another
OSD for the data), which seems sad.
* There seems to be a race if you need to unstub an op for two
separate requests that come in simultaneously, with nothing preventing
both of them from initiating the unstub.
* You can inject an unstub for read ops, but that turns them into a
write. That will cause problems in various cases where the object
isn't writeable yet.
* Why does a delete need the object data?
* You definitely wouldn't want to unstub data for scrubbing.
* There's a CEPH_OSD_OP_STAT which looks at what's in the object info;
that is broken here because you're using the normal truncation path.
There probably needs to be more cleverness or machinery distinguishing
between the "local" size used and the size of the object represented.
* I am concerned about recovery, but I think as long as the stub is
just a normal zero-sized object with an extra flag in the object info
that shouldn't be a big deal.
* I think snapshots are probably busted with this; did you check how
they interact?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html