Re: Re: The questions of data collection and cache tiering in Ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 12, 2015 at 6:37 AM, 蔡毅 <cymengxiang@xxxxxxx> wrote:
> Greg,
>     Thank you  a lot for your timely reply. These are really helpful for me.I also have some doubts.
>     In Ceph, besides monitoring pool, pg, object, it can also acquire other statistics such as  CPU, IOPS, BW. In order to acquire the information ,
> do the Ceph need to call other tools or have achieved functions in the source code because I could only find easy equation (like the division ) in
> source code?

Uh, it's using the standard linux interfaces for this. Some of that's
via the psuedo-fs stuff in /proc/sys or whatever...

>     As a object , there are two parts : data and attributes. Do they store in different spaces finally because I find these is some attributes information in OMap?

Yes, although the specifics depend on your backend configuration. In
the default setup, data is stored in the filesystem, "omap" entries
are stored in leveldb, and the first few object xattrs will get stored
in fs xattrs but more will overflow into leveldb.

>     In your reply, you said ,”any subsequent operations on that object will wait until that durable op is readable before completing.” so if I set the system
> flushes the objects from journal to disk every 15s, does it mean I could not read the object in 15s because I only write the object on the journal but not yet
> on the disk? Is it possible to cause some problems?

No, that's not quite it. If you set it for a 15-second flush interval,
that's how often the backing filesystem will get a forced sync to
disk. But the journal will have persisted the data way before that,
and the data will be written into the filesystem right after that — so
it's readable out of page cache! As soon as the write operation
completes in-memory the OSD will continue processing subsequent ops on
the object (because the OSD journal has persisted the operation).
-Greg

>     Thank you so much.
>     Yours,
>     Chay
>
>
>
>
>
>
>
> 在 2015-10-09 02:34:25,"Gregory Farnum" <gfarnum@xxxxxxxxxx> 写道:
>>On Thu, Oct 8, 2015 at 9:09 AM, 蔡毅 <cymengxiang@xxxxxxx> wrote:
>>>
>>> Dear developers,
>>>
>>>    Recently I met some troubles when I read the Ceph’s source code and understand the architecture.
>>> The details of problems are as followed.
>>>
>>>    1.In monitoring tools, they can collect much data when Ceph runs. I wonder what
>>> kind of data the Ceph can provide (object data, PG data or other data?). Could the
>>> Ceph provide every object’s data (e.g. The times the object is read or wrote
>>> ,the latest time the object is used ,etc.) ,if Ceph could ,in source code ,where
>>> could I find these details. I really want to know the monitoring data the Ceph
>>> can provides and where they are in source code so that I could know how to use it
>>> more efficiently. For example, I know the Ceph could provide the data of the
>>> objects’ number per PG, the read and write bandwidth, but I couldn’t find how to
>>> achieve these in source code.
>>
>>I'm not quite sure what you're asking here, but I think you'll want to
>>look at the MPGStats.h message (in ceph/src/messages), and trace
>>backwards through the OSD code (ceph/src/osd/) which creates them and
>>then forwards through the monitor code (ceph/src/mon/OSDMonitor.cc)
>>
>>>
>>>    2.From official documents, Ceph provides the cache tiering to improve
>>> performance. But I couldn’t find more details to describe the cache tiering
>>> like which kind of algorithm the cache agent uses. In the source code, where
>>> could I find these?
>>
>>The cache tiering is part of the OSD. Look at the TierAgentState.h
>>file and the parts of ReplicatedPG.cc which reference it.
>>
>>>
>>>   3.In write process , there are two responses to client ,first is from journal and
>>> second is occurred when object writes to real disk .so when I write a object to
>>> Ceph using librbd, does not the write finish until the second response occurs and
>>> what mean the first and second responses for clients? When a object writes to journal
>>> but not to filestore (that is not to disk ), could I read this object? If I could,
>>> where could I read this object?
>>
>>You get a response from the OSDs:
>>1) when the write operation is durable.
>>2) when the write operation is readable.
>>
>>The order these arrive in will depend on your OSD configuration (btrfs
>>can send readable before durable; xfs always sends durable first;
>>etc). If you get a "durable" response, any subsequent operations on
>>that object will wait until that durable op is readable before
>>completing.
>>-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux