Re: Understanding EC properties for CephFS / small files.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Jesper,

On Sat, Feb 16, 2019 at 11:11 PM <jesper@xxxxxxxx> wrote:
>
> Hi List.
>
> I'm trying to understand the nuts and bolts of EC / CephFS
> We're running an EC4+2 pool on top of 72 x 7.2K rpm 10TB drives. Pretty
> slow bulk / archive storage.
>
> # getfattr -n ceph.dir.layout /mnt/home/cluster/mysqlbackup
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/home/cluster/mysqlbackup
> ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304
> pool=cephfs_data_ec42"
>
> This configuration is taken directly out of the online documentation:
> (Which may have been where it went all wrong from our perspective):

Correction: this is from the Ceph default for the file layout. The
default is that no file striping is performed and 4MB chunks are used
for file blocks. You may find this document instructive on how files
are striped (especially the ASCII art):

https://github.com/ceph/ceph/blob/master/doc/dev/file-striping.rst

> http://docs.ceph.com/docs/master/cephfs/file-layouts/
>
> Ok, this means that a 16MB file will be split at 4 chuncks of 4MB each
> with 2 erasure coding chuncks? I dont really understand the stripe_count
> element?

A 16 MB file would be split into 4 RADOS objects. Then those objects
would be distributed across OSDs according to the EC profile.

> And since erasure-coding works at the object level, striping individual
> objects across - here 4 replicas - it'll end up filling 16MB ? Or
> is there an internal optimization causing this not to be the case?
>
> Additionally, when reading the file, all 4 chunck need to be read to
> assemble the object. Causing (at a minumum) 4 IOPS per file.
>
> Now, my common file size is < 8MB and commonly 512KB files are on
> this pool.
>
> Will that cause a 512KB file to be padded to 4MB with 3 empty chuncks
> to fill the erasure coded profile and then 2 coding chuncks on top?
> In total 24MB for storing 512KB ?

No. Files do not always use the full 4MB chunk. The final chunk of the
file will be minimally sized. For example:

pdonnell@senta02 ~/mnt/tmp.ZS9VCMhBWg$ cp /bin/grep .
pdonnell@senta02 ~/mnt/tmp.ZS9VCMhBWg$ stat grep
  File: 'grep'
  Size: 211224          Blocks: 413        IO Block: 4194304 regular file
Device: 2ch/44d Inode: 1099511627836  Links: 1
Access: (0750/-rwxr-x---)  Uid: ( 1163/pdonnell)   Gid: ( 1163/pdonnell)
Access: 2019-02-18 14:02:11.503875296 -0500
Modify: 2019-02-18 14:02:11.523375657 -0500
Change: 2019-02-18 14:02:11.523375657 -0500
 Birth: -
pdonnell@senta02 ~/mnt/tmp.ZS9VCMhBWg$ printf %x 1099511627836
1000000003c
$ bin/rados -p cephfs.a.data stat 1000000003c.00000000
cephfs.a.data/1000000003c.00000000 mtime 2019-02-18 14:02:11.000000, size 211224

So the object holding "grep" still only uses ~200KB and not 4MB.


-- 
Patrick Donnelly
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux