Re: osdmaps not being cleaned up in 12.2.8

Bryan Stillwell <bstillwell@xxxxxxxxxxx> · Fri, 11 Jan 2019 15:58:51 +0000

That thread looks like the right one.

So far I haven't needed to restart the osd's for the churn trick to work.  I bet you're right that something thinks it still needs one of the old osdmaps on your cluster.  Last night our cluster finished another round of expansions and
 we're seeing up to 49,272 osdmaps hanging around.  The churn trick seems to be working again too.

Bryan

From: Dan van der Ster <dan@xxxxxxxxxxxxxx>

Date: Thursday, January 10, 2019 at 3:13 AM

To: Bryan Stillwell <bstillwell@xxxxxxxxxxx>

Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>

Subject: Re: [ceph-users] osdmaps not being cleaned up in 12.2.8

Hi Bryan,

I think this is the old hammer thread you refer to:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013060.html

We also have osdmaps accumulating on v12.2.8 -- ~12000 per osd at the moment.

I'm trying to churn the osdmaps like before, but our maps are not being trimmed.

Did you need to restart the osd's before the churn trick would work?

If so, it seems that something is holding references to old maps, like

like that old hammer issue.

Cheers, Dan

On Tue, Jan 8, 2019 at 5:39 PM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:

I was able to get the osdmaps to slowly trim (maybe 50 would trim with each change) by making small changes to the CRUSH map like this:

for i in {1..100}; do

     ceph osd crush reweight osd.1754 4.00001

     sleep 5

     ceph osd crush reweight osd.1754 4

     sleep 5

done

I believe this was the solution Dan came across back in the hammer days.  It works, but not ideal for sure.  Across the cluster it freed up around 50TB of data!

Bryan

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Bryan Stillwell <bstillwell@xxxxxxxxxxx>

Date: Monday, January 7, 2019 at 2:40 PM

To: ceph-users <ceph-users@xxxxxxxxxxxxxx>

Subject: [ceph-users] osdmaps not being cleaned up in 12.2.8

I have a cluster with over 1900 OSDs running Luminous (12.2.8) that isn't cleaning up old osdmaps after doing an expansion.  This is even after the cluster became 100% active+clean:

# find /var/lib/ceph/osd/ceph-1754/current/meta -name 'osdmap*' | wc -l

46181

With the osdmaps being over 600KB in size this adds up:

# du -sh /var/lib/ceph/osd/ceph-1754/current/meta

31G        /var/lib/ceph/osd/ceph-1754/current/meta

I remember running into this during the hammer days:

http://tracker.ceph.com/issues/13990

Did something change recently that may have broken this fix?

Thanks,

Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com