Re: OSD-Based Object Stubs

Marcel Lauhoff <ml@xxxxxxxx> · Sat, 20 Jun 2015 12:18:08 +0200

Hi,

thanks for the comments!

Gregory Farnum <greg@xxxxxxxxxxx> writes:

> On Thu, May 28, 2015 at 3:01 AM, Marcel Lauhoff <ml@xxxxxxxx> wrote:
>>
>> Gregory Farnum <greg@xxxxxxxxxxx> writes:
>>
>>> Do you have a shorter summary than the code of how these stub and
>>> unstub operations relate to the object redirects? We didn't make a
>>> great deal of use of them but the basic data structures are mostly
>>> present in the codebase, are interpreted in at least some of the right
>>> places, and were definitely intended to cover this kind of use case.
>>> :)
>>> -Greg
>>
>> As far as I understood the redirect feature it is about pointing to
>> other objects inside the Ceph cluster. The stubs feature allows
>> pointing to anything. An HTTP server in concept code.
>>
>> Then stubs use an IMHO simpler approach to getting objects back: It's
>> the task of the OSD. Stubbed objects just take longer to access, due to
>> unstubbing it first.
>> Redirects on the other hand leave this to the client: Object redirected
>> -> Tell client to retrieve it elsewhere.
>
> Ah, of course.
>
> I got a chance to look at this briefly today. Some notes:
>
> * You're using synchronous reads. That will prevent use of stubbing on
> EC pools (which only do async reads, as they might need to hit another
> OSD for the data), which seems sad.
Good point. I didn't look at how EC pools work, yet. I assumed that
a stub feature would be quite different for both pool types and tried
the replicated first.

> * There seems to be a race if you need to unstub an op for two
> separate requests that come in simultaneously, with nothing preventing
> both of them from initiating the unstub.
Right. I should probably add some "in flight" states there.

> * You can inject an unstub for read ops, but that turns them into a
> write. That will cause problems in various cases where the object
> isn't writeable yet.
I thought I fixed that by doing "ctx->op->set_write()" in the implicit
unstub code.

> * Why does a delete need the object data?
That was just a short cut: In the quite simplistic Remote API there is
only put and get. A unstub before delete also deletes the remote object.

> * You definitely wouldn't want to unstub data for scrubbing.
What's the alternative? The remote should do scrubbing or just skip the
stubbed object?

> * There's a CEPH_OSD_OP_STAT which looks at what's in the object info;
> that is broken here because you're using the normal truncation path.
> There probably needs to be more cleverness or machinery distinguishing
> between the "local" size used and the size of the object represented.
Of course.

> * I think snapshots are probably busted with this; did you check how
> they interact?
With this implementation I think they really are. Stubs+Snapshouts could
be a nice thing for backups. Just stub a read only snapshot.

> -Greg

~marcel

--
Marcel Lauhoff
Mail/XMPP: ml@xxxxxxxx
http://irq0.org
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in