Re: Ceph CBT simulate down OSDs

Mark Nelson <mnelson@xxxxxxxxxx> · Tue, 2 May 2017 11:29:53 -0500

Hi Henry,

The recovery test mechanism is basically a state machine launched in 
another thread that runs concurrently during whatever benchmark you want 
to run.  The basic premise is that it waits for a configurable amount of 
"pre" time to let the benchmarks get started, then marks osd down/out, 
waits until the cluster is healthy, then marks them up/in, and waits 
until they are healthy again.  This happens while your chosen background 
load is runs.  At the end, there is a post phase where you can specify 
how long you would like the benchmark to continue running after the 
recovery process has completed.  ceph health is run every second during 
this process and recorded in a log to keep track of what's happening 
while the tests are running.

Typically once the recovery test is complete, a callback in the 
benchmark module is made to let the benchmark know the recovery test is 
done.  Usually this will kill the benchmark (ie you might choose to run 
a 4 hour fio test and then let the recovery process inform the fio 
benchmark module kill fio).  Alternately, you can tell it to keep 
repeating the process until the benchmark itself completes with the 
"repeat" option.

The actual yaml to do this is quite simple.  Simply put a 
"recovery_test" section in your cluster section, tell it which OSDs you 
want to mark down, and optionally give it repeat, pre_time, and 
post_time options.

Here's an example:

recovery_test:
  osds: [3,6]
  repeat: True
  pre_time: 60
  post_time: 60

Here's a paper where this functionality was actually used to predict how 
long our thrashing tests in the ceph QA lab would take based on 
HDDs/SSDs.  We knew our thrashing tests were using most of the time in 
the lab and we were able to use this to determine how much buying SSDs 
would speed up the QA runs.

https://drive.google.com/open?id=0B2gTBZrkrnpZYVpPb3VpTkw5aFk

See appendix B for the ceph.conf that was used at the time for the 
tests.  Also, please do not use the "-n size=64k" mkfs.xfs option in 
that yaml file.  We later found out that it can cause XFS to deadlock 
and may not be safe to use.

Mark

On 05/02/2017 10:58 AM, Henry Ngo wrote:
Hi all,

CBT documentation states that this can be achieved. If so, how do I set
it up? What do I add in the yaml file? Below is an EC example. Thanks.

cluster:

  head:"ceph@head"

  clients:["ceph@client"]

  osds:["ceph@osd"]

  mons:["ceph@mon"]

  osds_per_node:1

  fs:xfs

  mkfs_opts:-f -i size=2048

  mount_opts:-o inode64,noatime,logbsize=256k

  conf_file:/home/ceph/ceph-tools/cbt/example/ceph.conf

  ceph.conf:/home/ceph/ceph-tools/cbt/example/ceph.conf

  iterations:3

  rebuild_every_test:False

  tmp_dir:"/tmp/cbt"

  pool_profiles:

    erasure:

      pg_size:4096

      pgp_size:4096

      replication:'erasure'

      erasure_profile:'myec'

benchmarks:

  radosbench:

    op_size:[4194304, 524288, 4096]

    write_only:False

    time:300

    concurrent_ops:[128]

    concurrent_procs:1

    use_existing:True

    pool_profile:erasure

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com