Re: Accessing file layout information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 15, 2014 at 1:29 PM, Atchley, Scott <atchleyes@xxxxxxxx> wrote:
> On Dec 15, 2014, at 4:10 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>
>> On Mon, Dec 15, 2014 at 12:06 PM, Atchley, Scott <atchleyes@xxxxxxxx> wrote:
>>> On Dec 15, 2014, at 2:42 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>
>>>> On Mon, Dec 15, 2014 at 10:54 AM, Atchley, Scott <atchleyes@xxxxxxxx> wrote:
>>>>> Hi all,
>>>>>
>>>>> For a given file in cephfs, I would like to determine:
>>>>>
>>>>> 1) the number of PGs
>>>>> 2) the PG IDs
>>>>> 3) the offsets handles by each PG
>>>>> 4) the stripe unit (i.e. bytes per block of data)
>>>>>
>>>>> preferably using a C API.
>>>>>
>>>>> I found the getfattr command line tool which provides (1) and (4).
>>>>>
>>>>> It appears that cephfs command line tool provides some of the info, but the manpage is not clear as to what is provided by show_layout versus show_location. Does show_layout provide (1) and (4)? Does show_location include (2) and (3)?
>>>>>
>>>>> I found a Java CephFileExtent() that provides (1) and a list of OSDs for the stripe unit that contains the provided file extent.
>>>>
>>>> What exactly are you trying to accomplish here? I can imagine some
>>>> limited scenarios where you want to map to PGs, but in general you're
>>>> a lot more interested in either the objects associated with the file,
>>>> or the OSDs that host them. I think you can get objects and OSDs out
>>>> of the latest libcephfs api.
>>>
>>> We implemented a file transfer tool for Lustre that transfers data by object rather than by file. The scheduling of the object transfers is done in a way to defer transfers from congested servers in hopes that they will be less congested in the near future (given our bulk-synchronous workload, congestion of a set of servers is generally temporary).
>>>
>>> I would rather know OSDs versus PGs, but I did not assume that the info was available.
>>
>> Yeah, look at ceph_get_file_extent_osds in libcephfs. :)
>
> Ok, thanks for the pointer.
>
> It looks like I can get (1) with ceph_get_file_stripe_count().

Oh, you mean copies of a PG, not the number of PGs used by a
particular file. Yeah, that's just a pool property which is reported
in several different places. :)


>>>>> On the architecture page (http://ceph.com/docs/next/architecture/) under DATA STRIPING, does cephfs client use the same set of PGs used for Object Set 1 for Object Set 2 or does it use a different PGs?
>>>>
>>>> Assuming you mean does it use the same PGs for each file's objects,
>>>> the answer is no, definitely not.
>>>> -Greg
>>>
>>> No, I assume the Object Set 1 on that page indicates four objects on four different PGs. Within Object Set 1, Object 0 holds stripe units 0, 4, 8, and 12. Object 1 holds stripe units 1, 5, 9, and 13 and so on. Each Object 0-3 are stored on different PGs. Is this correct so far?
>>>
>>> My question above is about Object Set 2, which holds Objects 4-7? Is Object 4 on the same PG as Object 0? Is Object 5 on the same PG as Object 1? Etc? Or does Object Set 2 (holding Objects 4-7) use different PGs?
>>
>> The sets use completely different names which are (generally) going to
>> be placed completely differently. They're just <inode number>.<object
>> number>, where the inode number is in hex and the object number is
>> zero-padded up to some size.
>> -Greg
>
> I am still not fully understanding object sets. Can a file have more than one object set?

I'm not sure what you mean by "Object set" here. If you mean "the
group of objects across which file fragments are striped", yes, if you
have 8MB of file and a single object set is 4MB then you'll have two
object sets. That size is determined by the size of an object and the
stripe width, IIRC.

But really the way the objects are generated is that the file extents
are mapped to an object based on the file layout (in the default case
where it's just chunked, then it's a simple modulus of file offset by
object size), and the object is written to.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux