Translating a RadosGW object name into a filename on disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks like I need to upgrade to Firefly to get ceph-kvstore-tool
before I can proceed.
I am getting some hits just from grepping the LevelDB store, but so
far nothing has panned out.

Thanks for the help!

On Tue, Aug 19, 2014 at 10:27 AM, Gregory Farnum <greg at inktank.com> wrote:
> It's been a while since I worked on this, but let's see what I remember...
>
> On Thu, Aug 14, 2014 at 11:34 AM, Craig Lewis <clewis at centraldesktop.com> wrote:
>> In my effort to learn more of the details of Ceph, I'm trying to
>> figure out how to get from an object name in RadosGW, through the
>> layers, down to the files on disk.
>>
>> clewis at clewis-mac ~ $ s3cmd ls s3://cpltest/
>> 2014-08-13 23:02        14M  28dde9db15fdcb5a342493bc81f91151
>> s3://cpltest/vmware-freebsd-tools.tar.gz
>>
>> Looking at the .rgw pool's contents tells me that the cpltest bucket
>> is default.73886.55:
>> root at dev-ceph0:/var/lib/ceph/osd/ceph-0/current# rados -p .rgw ls | grep cpltest
>> cpltest
>> .bucket.meta.cpltest:default.73886.55
>
> Okay, what you're seeing here are two different types, whose names I'm
> not going to get right:
> 1) The bucket link "cpltest", which maps from the name "cpltest" to a
> "bucket instance". The contents of cpltest, or one of its xattrs, are
> pointing at ".bucket.meta.cpltest:default.73886.55"
> 2) The "bucket instance" .bucket.meta.cpltest:default.73886.55. I
> think this contains the bucket index (list of all objects), etc.
>
>> The rados objects that belong to that bucket are:
>> root at dev-ceph0:~# rados -p .rgw.buckets ls | grep default.73886.55
>> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_1
>> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_3
>> default.73886.55_vmware-freebsd-tools.tar.gz
>> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_2
>> default.73886.55__shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_4
>
> Okay, so when you ask RGW for the object vmware-freebsd-tools.tar.gz
> from the cpltest bucket, it will look up (or, if we're lucky, have
> cached) the cpltest link, and find out that the "bucket prefix" is
> default.73886.55. It will then try and access the object
> "default.73886.55_vmware-freebsd-tools.tar.gz" (whose construction I
> hope is obvious ? bucket instance ID as a prefix, _ as a separate,
> then the object name). This RADOS object is called the "head" for the
> RGW object. In addition to (usually) the beginning bit of data, it
> will also contain some xattrs with things like a "tag" for any extra
> RADOS objects which include data for this RGW object. In this case,
> that tag is "RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ". (This construction is
> how we do atomic overwrites of RGW objects which are larger than a
> single RADOS object, in addition to a few other things.)
>
> I don't think there's any way of mapping from a shadow (tail) object
> name back to its RGW name. but if you look at the rados object xattrs,
> there might (? or might not) be an attr which contains the parent
> object in one form or another. Check that out.
>
> (Or, if you want to check out the source, I think all the relevant
> bits for this are somewhere in the
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>> I know those shadow__RpwwfOt2X-mhwU65Qa1OHDi--4OMGvQ_ files are the
>> rest of vmware-freebsd-tools.tar.gz.  I can infer that because this
>> bucket only has a single file (and the sum of the sizes matches).
>> With many files, I can't infer the link anymore.
>>
>> How do I look up that link?
>>
>> I tried reading the src/rgw/rgw_rados.cc, but I'm getting lost.
>>
>>
>>
>> My real goal is the reverse.  I recently repaired an inconsistent PG.
>> The primary replica had the bad data, so I want to verify that the
>> repaired object is correct.  I have a database that stores the SHA256
>> of every object.  If I can get from the filename on disk back to an S3
>> object, I can verify the file.  If it's bad, I can restore from the
>> replicated zone.
>>
>>
>> Aside from today's task, I think it's really handy to understand these
>> low level details.  I know it's been handy in the past, when I had
>> disk corruption under my PostgreSQL database.  Knowing (and
>> practicing) ahead of time really saved me a lot of downtime then.
>>
>>
>> Thanks for any pointers.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux