Re: removed_snaps in ceph osd dump?

Jan Schermer <jan@xxxxxxxxxxx> · Tue, 16 Jun 2015 13:03:33 +0200

> On 16 Jun 2015, at 12:59, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> 
> On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>> Well, I see mons dropping out when deleting large amount of snapshots, and it leats a _lot_ of CPU to delete them
> 
> Well, you're getting past my expertise on the subject, but deleting
> snapshots can sometimes be expensive, yes. If monitors are dropping
> out that probably means they're getting swamped by the OSDs sending in
> updates; you might want to adjust your reporting config options
> (osd_mon_report_interval_max, osd_mon_report_interval_min, and
> osd_pg_stat_report_interval_max).
> 

They are running fine under normal conditions, but when I run a script that deletes lots (say a 100) snapshots and volumes then the mon node I’m running it from drops out occassionaly:

ceph-mon.node-14.log:2015-06-15 06:01:28.417302 7fee3945d700 -1 mon.node-14@2(peon).paxos(paxos updating c 38322179..38322802) lease_expire from mon.0 172.20.1.3:6789/0 is 16.491629 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)

And during this time CPU usage on all the OSDs spikes to 200%+

>> , I also had to manually schedule “compact” for the leveldb on mons as it stopped compacting itself. But that doesn’t impact IO as far as I know (or does the mon speed actually impact IO?).
> 
> It own't impact IO generally, unless it's blocking cluster updates.

When I run the compact, the node that’s told to compact drops out of quorom and rejoins when finished - it only runs for 10-30s twice daily, so that’s not a big problem, I just thought I should mention it in case someone know a better solution.

> 
> 
>> I can see lots of layers on the snapshotted volumes as well as extremely large overhead with snapshots (2.5TB volume that actually occupies 7TB(*3) of space until even the HEAD is deleted), but that’s a different story I guess…
> 
> It's possible that you've just got enough snapshots the OSDs haven't
> ever caught up with deleting them...not sure. :/

I can see the data disappear after I delete snapshots (would be nice to know the progress of the snapshot pruning - there’s no indication it’s doing anything in the background but I can see it deleting TBs of files).

> 
> What version of Ceph are you currently running?

0.67.12 dumpling.

> 
>> 
>> Jan
>> 
>>> On 16 Jun 2015, at 12:32, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> 
>>> On Tue, Jun 16, 2015 at 3:30 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>>>> Thanks for the answer.
>>>> So it doesn’t hurt performance if it grows to ridiculous size - e.g. no lookup table overhead, stat()ing additional files etc.?
>>> 
>>> Nope, definitely nothing like that. If it gets sufficiently fragmented
>>> it can expand the size of the OSDMap, which might cause issues. But
>>> they'll be things like increasing memory consumption and slow map
>>> propagation and won't impact your regular object access.
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com