On Tue, Jun 16, 2015 at 12:03 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote: > >> On 16 Jun 2015, at 12:59, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote: >>> Well, I see mons dropping out when deleting large amount of snapshots, and it leats a _lot_ of CPU to delete them >> >> Well, you're getting past my expertise on the subject, but deleting >> snapshots can sometimes be expensive, yes. If monitors are dropping >> out that probably means they're getting swamped by the OSDs sending in >> updates; you might want to adjust your reporting config options >> (osd_mon_report_interval_max, osd_mon_report_interval_min, and >> osd_pg_stat_report_interval_max). >> > > They are running fine under normal conditions, but when I run a script that deletes lots (say a 100) snapshots and volumes then the mon node I’m running it from drops out occassionaly: > > ceph-mon.node-14.log:2015-06-15 06:01:28.417302 7fee3945d700 -1 mon.node-14@2(peon).paxos(paxos updating c 38322179..38322802) lease_expire from mon.0 172.20.1.3:6789/0 is 16.491629 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) > > And during this time CPU usage on all the OSDs spikes to 200%+ > > >>> , I also had to manually schedule “compact” for the leveldb on mons as it stopped compacting itself. But that doesn’t impact IO as far as I know (or does the mon speed actually impact IO?). >> >> It own't impact IO generally, unless it's blocking cluster updates. > > When I run the compact, the node that’s told to compact drops out of quorom and rejoins when finished - it only runs for 10-30s twice daily, so that’s not a big problem, I just thought I should mention it in case someone know a better solution. > >> >> >>> I can see lots of layers on the snapshotted volumes as well as extremely large overhead with snapshots (2.5TB volume that actually occupies 7TB(*3) of space until even the HEAD is deleted), but that’s a different story I guess… >> >> It's possible that you've just got enough snapshots the OSDs haven't >> ever caught up with deleting them...not sure. :/ > > I can see the data disappear after I delete snapshots (would be nice to know the progress of the snapshot pruning - there’s no indication it’s doing anything in the background but I can see it deleting TBs of files). > >> >> What version of Ceph are you currently running? > > 0.67.12 dumpling. Ah, yeah, I believe this is all much nicer on Firefly and Hammer. :) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com