Re: removed_snaps in ceph osd dump?

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 16 Jun 2015 12:12:57 +0100

On Tue, Jun 16, 2015 at 12:03 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>
>> On 16 Jun 2015, at 12:59, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>>> Well, I see mons dropping out when deleting large amount of snapshots, and it leats a _lot_ of CPU to delete them
>>
>> Well, you're getting past my expertise on the subject, but deleting
>> snapshots can sometimes be expensive, yes. If monitors are dropping
>> out that probably means they're getting swamped by the OSDs sending in
>> updates; you might want to adjust your reporting config options
>> (osd_mon_report_interval_max, osd_mon_report_interval_min, and
>> osd_pg_stat_report_interval_max).
>>
>
> They are running fine under normal conditions, but when I run a script that deletes lots (say a 100) snapshots and volumes then the mon node I’m running it from drops out occassionaly:
>
> ceph-mon.node-14.log:2015-06-15 06:01:28.417302 7fee3945d700 -1 mon.node-14@2(peon).paxos(paxos updating c 38322179..38322802) lease_expire from mon.0 172.20.1.3:6789/0 is 16.491629 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)
>
> And during this time CPU usage on all the OSDs spikes to 200%+
>
>
>>> , I also had to manually schedule “compact” for the leveldb on mons as it stopped compacting itself. But that doesn’t impact IO as far as I know (or does the mon speed actually impact IO?).
>>
>> It own't impact IO generally, unless it's blocking cluster updates.
>
> When I run the compact, the node that’s told to compact drops out of quorom and rejoins when finished - it only runs for 10-30s twice daily, so that’s not a big problem, I just thought I should mention it in case someone know a better solution.
>
>>
>>
>>> I can see lots of layers on the snapshotted volumes as well as extremely large overhead with snapshots (2.5TB volume that actually occupies 7TB(*3) of space until even the HEAD is deleted), but that’s a different story I guess…
>>
>> It's possible that you've just got enough snapshots the OSDs haven't
>> ever caught up with deleting them...not sure. :/
>
> I can see the data disappear after I delete snapshots (would be nice to know the progress of the snapshot pruning - there’s no indication it’s doing anything in the background but I can see it deleting TBs of files).
>
>>
>> What version of Ceph are you currently running?
>
> 0.67.12 dumpling.

Ah, yeah, I believe this is all much nicer on Firefly and Hammer. :)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com