Translating a RadosGW object name into a filename on disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's been a while since I worked on this, but let's see what I remember...

On Thu, Aug 14, 2014 at 11:34 AM, Craig Lewis <clewis at centraldesktop.com> wrote:
> In my effort to learn more of the details of Ceph, I'm trying to
> figure out how to get from an object name in RadosGW, through the
> layers, down to the files on disk.
>
> clewis at clewis-mac ~ $ s3cmd ls s3://cpltest/
> 2014-08-13 23:02        14M  28dde9db15fdcb5a342493bc81f91151
> s3://cpltest/vmware-freebsd-tools.tar.gz
>
> Looking at the .rgw pool's contents tells me that the cpltest bucket
> is default.73886.55:
> root at dev-ceph0:/var/lib/ceph/osd/ceph-0/current# rados -p .rgw ls | grep cpltest
> cpltest
> .bucket.meta.cpltest:default.73886.55

Okay, what you're seeing here are two different types, whose names I'm
not going to get right:
1) The bucket link "cpltest", which maps from the name "cpltest" to a
"bucket instance". The contents of cpltest, or one of its xattrs, are
pointing at ".bucket.meta.cpltest:default.73886.55"
2) The "bucket instance" .bucket.meta.cpltest:default.73886.55. I
think this contains the bucket index (list of all objects), etc.

> The rados objects that belong to that bucket are:
> root at dev-ceph0:~# rados -p .rgw.buckets ls | grep default.73886.55
> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_1
> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_3
> default.73886.55_vmware-freebsd-tools.tar.gz
> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_2
> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_4

Okay, so when you ask RGW for the object vmware-freebsd-tools.tar.gz
from the cpltest bucket, it will look up (or, if we're lucky, have
cached) the cpltest link, and find out that the "bucket prefix" is
default.73886.55. It will then try and access the object
"default.73886.55_vmware-freebsd-tools.tar.gz" (whose construction I
hope is obvious ? bucket instance ID as a prefix, _ as a separate,
then the object name). This RADOS object is called the "head" for the
RGW object. In addition to (usually) the beginning bit of data, it
will also contain some xattrs with things like a "tag" for any extra
RADOS objects which include data for this RGW object. In this case,
that tag is "RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ". (This construction is
how we do atomic overwrites of RGW objects which are larger than a
single RADOS object, in addition to a few other things.)

I don't think there's any way of mapping from a shadow (tail) object
name back to its RGW name. but if you look at the rados object xattrs,
there might (? or might not) be an attr which contains the parent
object in one form or another. Check that out.

(Or, if you want to check out the source, I think all the relevant
bits for this are somewhere in the
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

> I know those shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_ files are the
> rest of vmware-freebsd-tools.tar.gz.  I can infer that because this
> bucket only has a single file (and the sum of the sizes matches).
> With many files, I can't infer the link anymore.
>
> How do I look up that link?
>
> I tried reading the src/rgw/rgw_rados.cc, but I'm getting lost.
>
>
>
> My real goal is the reverse.  I recently repaired an inconsistent PG.
> The primary replica had the bad data, so I want to verify that the
> repaired object is correct.  I have a database that stores the SHA256
> of every object.  If I can get from the filename on disk back to an S3
> object, I can verify the file.  If it's bad, I can restore from the
> replicated zone.
>
>
> Aside from today's task, I think it's really handy to understand these
> low level details.  I know it's been handy in the past, when I had
> disk corruption under my PostgreSQL database.  Knowing (and
> practicing) ahead of time really saved me a lot of downtime then.
>
>
> Thanks for any pointers.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux