Re: Removing Snapshots Killing Cluster Performance

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Tue, 2 Dec 2014 11:34:20 -0800

On Mon, Dec 1, 2014 at 1:51 AM, Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> wrote:

I could not find any way to throttle the background deletion activity
(the command returns almost immediately). 

I'm only aware of osd snap trim sleep.  I haven't tried this since my Firefly upgrade though.

I have tested out osd scrub sleep under a heavy deep-scrub load, and found that I needed a value of 1.0, which is much higher than the recommended starting point of 0.005.  I'll revisit this when #9487 gets backported (Thanks Dan Van Der Ster!).

I used ceph tell osd.\* injectargs, and watched my IO graphs.  Start with 0.005, and multiple by 10 until you see a change.  It took 10-60 seconds to see a change after injecting the args.

While this is a big issue in itself for us, we would at least try to

estimate how long the process will take per snapshot / per pool. I
assume the time needed is a function of the number of objects that were
modified between two snapshots. 

That matches my experiences as well.  "Big" snapshots are take longer, and are much more likely to cause a cluster outage than "small" snapshots.

1) Is there any way to control how much such an operation will
tax the cluster (we would be happy to have it run longer, if that meant
not utilizing all disks fully during that time)?

On Firefly,  osd snap trim sleep, and playing with the CFQ scheduler are your only options.  They're not great options.  If you can upgrade to Giant, the snap trim sleep should solve your problem.

There is some work being done in Hammer: https://wiki.ceph.com/Planning/Blueprints/Hammer/osd%3A_Scrub%2F%2FSnapTrim_IO_prioritization

For the time being, I'm letting my snapshots accumulate.  I can't recover anything without the database backups, and those are deleted on time, so I can say with a straight face that their data is deleted.  I'll collect the garbage later.

3) Would SSD journals help here? Or any other hardware configuration
change for that matter?

Probably, but it's not going to fix it.  I added SSD journals.  It's better, but I still had downtime after trimming.  I'm glad I added them though.  The cluster is overall are much healthier and more responsive.  In particular, backfilling doesn't cause massive latency anymore.

4) Any other recommendations? We definitely need to remove the data,
not because of a lack of space (at least not at the moment), but because
when customers delete stuff / cancel accounts, we are obliged to remove
their data at least after a reasonable amount of time.

I know it's kind of snarky, but perhaps you can redefine "reasonable" until you have a change to upgrade to Giant or Hammer?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com