Hello Jesper, On Sat, Feb 16, 2019 at 11:11 PM <jesper@xxxxxxxx> wrote: > > Hi List. > > I'm trying to understand the nuts and bolts of EC / CephFS > We're running an EC4+2 pool on top of 72 x 7.2K rpm 10TB drives. Pretty > slow bulk / archive storage. > > # getfattr -n ceph.dir.layout /mnt/home/cluster/mysqlbackup > getfattr: Removing leading '/' from absolute path names > # file: mnt/home/cluster/mysqlbackup > ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 > pool=cephfs_data_ec42" > > This configuration is taken directly out of the online documentation: > (Which may have been where it went all wrong from our perspective): Correction: this is from the Ceph default for the file layout. The default is that no file striping is performed and 4MB chunks are used for file blocks. You may find this document instructive on how files are striped (especially the ASCII art): https://github.com/ceph/ceph/blob/master/doc/dev/file-striping.rst > http://docs.ceph.com/docs/master/cephfs/file-layouts/ > > Ok, this means that a 16MB file will be split at 4 chuncks of 4MB each > with 2 erasure coding chuncks? I dont really understand the stripe_count > element? A 16 MB file would be split into 4 RADOS objects. Then those objects would be distributed across OSDs according to the EC profile. > And since erasure-coding works at the object level, striping individual > objects across - here 4 replicas - it'll end up filling 16MB ? Or > is there an internal optimization causing this not to be the case? > > Additionally, when reading the file, all 4 chunck need to be read to > assemble the object. Causing (at a minumum) 4 IOPS per file. > > Now, my common file size is < 8MB and commonly 512KB files are on > this pool. > > Will that cause a 512KB file to be padded to 4MB with 3 empty chuncks > to fill the erasure coded profile and then 2 coding chuncks on top? > In total 24MB for storing 512KB ? No. Files do not always use the full 4MB chunk. The final chunk of the file will be minimally sized. For example: pdonnell@senta02 ~/mnt/tmp.ZS9VCMhBWg$ cp /bin/grep . pdonnell@senta02 ~/mnt/tmp.ZS9VCMhBWg$ stat grep File: 'grep' Size: 211224 Blocks: 413 IO Block: 4194304 regular file Device: 2ch/44d Inode: 1099511627836 Links: 1 Access: (0750/-rwxr-x---) Uid: ( 1163/pdonnell) Gid: ( 1163/pdonnell) Access: 2019-02-18 14:02:11.503875296 -0500 Modify: 2019-02-18 14:02:11.523375657 -0500 Change: 2019-02-18 14:02:11.523375657 -0500 Birth: - pdonnell@senta02 ~/mnt/tmp.ZS9VCMhBWg$ printf %x 1099511627836 1000000003c $ bin/rados -p cephfs.a.data stat 1000000003c.00000000 cephfs.a.data/1000000003c.00000000 mtime 2019-02-18 14:02:11.000000, size 211224 So the object holding "grep" still only uses ~200KB and not 4MB. -- Patrick Donnelly _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com