Re: ceph write path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sheng,

On Thu, 24 Jan 2013, sheng qiu wrote:
> i am trying to understand the ceph codes on client side.
> for write path, if it's aio_write, the ceph_write_begin() allocate
> pages in page cache to buffer the written data, however i did not see
> it allocated any space on the remote OSDs (for local fs such as ext2,
> the get_block() did this),
> i suppose it's done later when invoke kernel flushing process to write
> back the dirty pages.

Right.  Objects are instantiated and written to teh osds when the write 
operations are sent over the network, normally during writeback (via 
the ->writepages() op in addr.c).

> i checked the ceph_writepages_start(), here it seems organize the
> dirty data and prepare the requests to send to the OSDs.  For new
> allocated written data, how it maps to the OSDs and where it is done?
> is it done in ceph_osdc_new_request()?

I happens later, when the actual request is ready to go over the wire. The 
target OSD may change in the meantime, or the request may have to be 
resent to another OSD.  As far as the upper layers are concerned, though, 
they are writing to the object, without caring where the object happens to 
currently live.

> If the transfer unit is not limited to sizes of obj, i supposed that
> ceph needed to packed several pieces of data (smaller than one obj
> size)  together so that there won't be internal fragmentation for an
> object. who does this job and which part of source codes/files are
> related with this?

Each file is striped over a different sequence of objects.  Small 
files mean small objects.  Large files stipe over (by default) 4 
MB objects.  It's the OSDs job to store these efficiently.  We just use a 
local file system.  btrfs is great about packing small files inline in the 
btree; xfs and ext4 are more convential fs's and pretty well.

sage

> I really want to get a deep understanding about the codes, so i raised
> these questions. if my understanding is not correct, please figure
> out. i will be very appreciated.
> 
> Thanks,
> Sheng
> 
> --
> Sheng Qiu
> Texas A & M University
> Room 332B Wisenbaker
> email: herbert1984106@xxxxxxxxx
> College Station, TX 77843-3259
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux