Re: Ceph CBT simulate down OSDs

Henry Ngo <henry.ngo@xxxxxxxx> · Tue, 2 May 2017 09:41:50 -0700

Mark,
Thanks for the detailed explanation and example. This is exactly what I was looking for.

Best,
Henry Ngo

On Tue, May 2, 2017 at 9:29 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Hi Henry,

The recovery test mechanism is basically a state machine launched in another thread that runs concurrently during whatever benchmark you want to run.  The basic premise is that it waits for a configurable amount of "pre" time to let the benchmarks get started, then marks osd down/out, waits until the cluster is healthy, then marks them up/in, and waits until they are healthy again.  This happens while your chosen background load is runs.  At the end, there is a post phase where you can specify how long you would like the benchmark to continue running after the recovery process has completed.  ceph health is run every second during this process and recorded in a log to keep track of what's happening while the tests are running.

Typically once the recovery test is complete, a callback in the benchmark module is made to let the benchmark know the recovery test is done.  Usually this will kill the benchmark (ie you might choose to run a 4 hour fio test and then let the recovery process inform the fio benchmark module kill fio).  Alternately, you can tell it to keep repeating the process until the benchmark itself completes with the "repeat" option.

The actual yaml to do this is quite simple.  Simply put a "recovery_test" section in your cluster section, tell it which OSDs you want to mark down, and optionally give it repeat, pre_time, and post_time options.

Here's an example:

recovery_test:

  osds: [3,6]

  repeat: True

  pre_time: 60

  post_time: 60

Here's a paper where this functionality was actually used to predict how long our thrashing tests in the ceph QA lab would take based on HDDs/SSDs.  We knew our thrashing tests were using most of the time in the lab and we were able to use this to determine how much buying SSDs would speed up the QA runs.

https://drive.google.com/open?id=0B2gTBZrkrnpZYVpPb3VpTkw5aFk

See appendix B for the ceph.conf that was used at the time for the tests.  Also, please do not use the "-n size=64k" mkfs.xfs option in that yaml file.  We later found out that it can cause XFS to deadlock and may not be safe to use.

Mark

On 05/02/2017 10:58 AM, Henry Ngo wrote:

Hi all,

CBT documentation states that this can be achieved. If so, how do I set

it up? What do I add in the yaml file? Below is an EC example. Thanks.

cluster:

  head:"ceph@head"

  clients:["ceph@client"]

  osds:["ceph@osd"]

  mons:["ceph@mon"]

  osds_per_node:1

  fs:xfs

  mkfs_opts:-f -i size=2048

  mount_opts:-o inode64,noatime,logbsize=256k

  conf_file:/home/ceph/ceph-tools/cbt/example/ceph.conf

  ceph.conf:/home/ceph/ceph-tools/cbt/example/ceph.conf

  iterations:3

  rebuild_every_test:False

  tmp_dir:"/tmp/cbt"

  pool_profiles:

    erasure:

      pg_size:4096

      pgp_size:4096

      replication:'erasure'

      erasure_profile:'myec'

benchmarks:

  radosbench:

    op_size:[4194304, 524288, 4096]

    write_only:False

    time:300

    concurrent_ops:[128]

    concurrent_procs:1

    use_existing:True

    pool_profile:erasure

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com