RE: Ceph space amplification ratio with write is more than 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Haomai,
I think what James is seeing excluding  journal write it is > 2X write amp. Must be something fishy on his setup..

Thanks & Regards
Somnath
-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Haomai Wang
Sent: Friday, February 26, 2016 5:27 PM
To: James (Fei) Liu-SSI
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: Ceph space amplification ratio with write is more than 2

It should be as expected. The reason is FileJournal will prepend header and append footer to the transaction data, and if using
aio+dio, we need to align these. So normally for 4k io, it will become
5-6k filejournal write IIRC. Plus filestore writeback to filesystem, it should be 2.5x. But local file system and disk should has some merge tricky, it should be less than 2.5x.

On Sat, Feb 27, 2016 at 1:46 AM, James (Fei) Liu-SSI <james.liu@xxxxxxxxxxxxxxx> wrote:
> Hi Cepher,
>    We recently have tested the Ceph space amplification with filestore
> through  writing data to ramdisk with rados bench. However, we found
> only
> 3584 MB was written from rados bench  but totally 8658M(1G of total 9G of ramdisk was used for journal) was used in the ramdisk.
>
> The totally space amplification is 7658/3584 = 2.14. (It is surprising
> huge factor) 1. Cluster configuration:
> . One OSD and one MON  are in the same machine with rados bench. Replication factor  was set as 1.
> . ceph version from ceph master
> 5979c8e34fa2f3d7efa28c29fb90758b3f9f818
> (45979c8e34fa2f3d7efa28c29fb90758b3f9f818)
> 2. Rados command we have been used :
> rados bench -p rbd -b 4096 --max-objects 1048576  300 write
> --no-cleanup 3. ceph cluster configuration:
> Please see appendix 1.
> 4. Results investigation.
> Please see appendix 0.
>
> Could anyone help to explain why  the space amplification with filestore is huge?  Thanks a lot.
>
> Regards,
> James
>
> Appendix 0:
>
> ssd@OptiPlex-9020-1:~/src/bluestore$ ceph df
> GLOBAL:
>     SIZE      AVAIL     RAW USED     %RAW USED
>     9206M      547M        8658M         94.05
> POOLS:
>     NAME     ID     USED      %USED     MAX AVAIL     OBJECTS
>     rbd      0      3584M     38.93          547M      917505
> ssd@OptiPlex-9020-1:~/src/bluestore$ ceph -s
>     cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>      health HEALTH_WARN
>             1 near full osd(s)
>      monmap e1: 1 mons at {localhost=127.0.0.1:6789/0}
>             election epoch 3, quorum 0 localhost
>      osdmap e7: 1 osds: 1 up, 1 in
>             flags sortbitwise
>       pgmap v31: 64 pgs, 1 pools, 3584 MB data, 896 kobjects
>             8658 MB used, 547 MB / 9206 MB avail
>                   64 active+clean
> ssd@OptiPlex-9020-1:~/src/bluestore$ df -h
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda2       2.7T  1.4T  1.2T  55% /
> none            4.0K     0  4.0K   0% /sys/fs/cgroup
> udev            5.9G  4.0K  5.9G   1% /dev
> tmpfs           1.2G  720K  1.2G   1% /run
> none            5.0M     0  5.0M   0% /run/lock
> none            5.9G   81M  5.8G   2% /run/shm
> none            100M   20K  100M   1% /run/user
> /dev/ram1       9.0G  8.4G  618M  94% /home/ssd/src/bluestore/ceph-deploy/osd/myosddata
>
> Journal size:
>
> -rw-r--r--   1 ssd ssd         37 Feb 25 15:53 ceph_fsid
> drwxr-xr-x 132 ssd ssd       4096 Feb 25 15:53 current
> -rw-r--r--   1 ssd ssd         37 Feb 25 15:53 fsid
> -rw-r--r--   1 ssd ssd         21 Feb 25 15:53 magic
> -rw-r--r--   1 ssd ssd 1073741824 Feb 25 15:53 myosdjournal
> -rw-r--r--   1 ssd ssd          6 Feb 25 15:53 ready
> -rw-r--r--   1 ssd ssd          4 Feb 25 15:53 store_version
> -rw-r--r--   1 ssd ssd         53 Feb 25 15:53 superblock
> -rw-r--r--   1 ssd ssd         10 Feb 25 15:53 type
> -rw-r--r--   1 ssd ssd          2 Feb 25 15:53 whoami
>
> du -sh
> It shows current use 8G.
>
> Appendix 1:
> [global]
> fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
> auth_cluster_required = none
> auth_service_required = none
> auth_client_required = none
> filestore_xattr_use_omap = true
> filestore_max_sync_interval=10
> filestore_fd_cache_size = 64
> filestore_fd_cache_shards = 32
> filestore_op_threads = 6
> filestore_queue_max_ops=5000
> filestore_queue_committing_max_ops=5000
> journal_max_write_entries=1000
> journal_queue_max_ops=3000
> filestore_wbthrottle_enable=false
> filestore_queue_max_bytes=1048576000
> filestore_queue_committing_max_bytes=1048576000
> journal_max_write_bytes=1048576000
> journal_queue_max_bytes=1048576000
>
> osd_journal_size = 1024
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
> debug_newstore = 0/0
> debug_keyvaluestore = 0/0
> osd_tracing = true
> osd_objectstore_tracing = true
> rados_tracing = true
> rbd_tracing = true
> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog
> osd_mkfs_options_xfs = -f -i size=2048 osd_op_threads = 32
> objecter_inflight_ops=102400
> ms_dispatch_throttle_bytes=1048576000
> objecter_infilght_op_bytes=1048576000
> osd_mkfs_type = xfs
> osd_client_message_size_cap = 0
> osd_client_message_cap = 0
> osd_enable_op_tracker = false
> mon_initial_members = localhost
> mon_host = 127.0.0.1
> mon data = '$CEPH_DEPLOY'/mon/mymondata mon cluster log file =
> '$CEPH_DEPLOY'/mon/mon.log
> keyring='$CEPH_DEPLOY'/ceph.client.admin.keyring
> run dir = '$CEPH_DEPLOY'/run
> [osd.0]
> osd data = '$CEPH_DEPLOY'/osd/myosddata osd journal =
> '$CEPH_DEPLOY'/osd/myosddata/myosdjournal
> #osd journal = '$WORKSPACE'/myosdjournal/myosdjournal
> log file = '$CEPH_DEPLOY'/osd/osd.log'
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux