Re: ceph write path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, January 24, 2013 at 6:41 PM, sheng qiu wrote:
> Hi,
> 
> i am trying to understand the ceph codes on client side.
> for write path, if it's aio_write, the ceph_write_begin() allocate
> pages in page cache to buffer the written data, however i did not see
> it allocated any space on the remote OSDs (for local fs such as ext2,
> the get_block() did this),
> i suppose it's done later when invoke kernel flushing process to write
> back the dirty pages.
> 
> i checked the ceph_writepages_start(), here it seems organize the
> dirty data and prepare the requests to send to the OSDs. For new
> allocated written data, how it maps to the OSDs and where it is done?
> is it done in ceph_osdc_new_request()?
> 
> If the transfer unit is not limited to sizes of obj, i supposed that
> ceph needed to packed several pieces of data (smaller than one obj
> size) together so that there won't be internal fragmentation for an
> object. who does this job and which part of source codes/files are
> related with this?
> 
> I really want to get a deep understanding about the codes, so i raised
> these questions. if my understanding is not correct, please figure
> out. i will be very appreciated.
> 
There seems to be a bit of a fundamental misunderstanding here. The Ceph storage system is built on top of an object store (RADOS), and so when the clients are doing writes they just tell the object storage daemon (OSD) to write the named object. The daemons are responsible for doing disk allocation and layout stuff themselves (and in fact they handle most of that by sticking the objects in a perfectly ordinary Linux filesystem).
The client maps the data to the correct OSDs via the CRUSH algorithm; it's a calculation based on the object name that anybody in the system can perform; there's no lookup or anything. 
It doesn't do any packing of different pieces of data into one object or anything like that.

I'd recommend checking out some of the academic papers available at http://ceph.com/resources/publications/ for more background information about the key algorithms and design choices.
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux