RE: [ceph-users] large memory leak on scrubbing

Sage Weil <sage@xxxxxxxxxxx> · Mon, 19 Aug 2013 08:55:37 -0700 (PDT)

On Mon, 19 Aug 2013, Mostowiec Dominik wrote:
> Thanks for your response.
> Great.
> 
> In latest cuttlefish it is also fixed I think?
> 
> We have two problems with scrubbing:
> - memory leaks
> - slow requests and wrongly mark osd with bucket index down (when scrubbing)

The slow requests can trigger if you have very large objects (including 
a very large rgw bucket index object).  But the message you quote below is 
for a scrub-reserve operation, which should really be excluded from the op 
warnings entirely.  Is that the only slow request message you see?

> Now we decided to turn off scrubbing and trigger it on maintenance window.
> I noticed that "ceph osd scrub", or "ceph osd deep-scrub" trigger scrub on osd but not for all PG.
> It is possible to trigger scrubbing all PG on one osd?

It should trigger a scrub on all PGs that are clean.  If a PG is 
recovering it will be skipped.

sage

> 
> --
> Regards 
> Dominik
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sage@xxxxxxxxxxx] 
> Sent: Saturday, August 17, 2013 5:11 PM
> To: Mostowiec Dominik
> Cc: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx; Studzi?ski Krzysztof; Sydor Bohdan
> Subject: Re: [ceph-users] large memory leak on scrubbing
> 
> Hi Dominic,
> 
> There is a bug fixed a couple of months back that fixes excessive memory consumption during scrub.  You can upgrade to the latest 'bobtail' branch.  
> See
> 
>  http://ceph.com/docs/master/install/debian/#development-testing-packages
> 
> Installing that package should clear this up.
> 
> sage
> 
> 
> On Fri, 16 Aug 2013, Mostowiec Dominik wrote:
> 
> > Hi,
> > We noticed some issues on CEPH/S3 cluster, I think it related with scrubbing: large memory leaks.
> > 
> > Logs 09.xx: 
> > https://www.dropbox.com/s/4z1fzg239j43igs/ceph-osd.4.log_09xx.tar.gz
> > >From 09.30 to 09.44 (14 minutes) osd.4 proces grows up to 28G. 
> > 
> > I think this is something curious:
> > 2013-08-16 09:43:48.801331 7f6570d2e700  0 log [WRN] : slow request 
> > 32.794125 seconds old, received at 2013-08-16 09:43:16.007104: 
> > osd_sub_op(unknown.0.0:0 16.113d 0//0//-1 [scrub-reserve] v 0'0 
> > snapset=0=[]:[] snapc=0=[]) v7 currently no flag points reached
> > 
> > We have large rgw index and a lot of large files than on this cluster.
> > ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
> > Setup: 
> > - 12 servers x 12 OSD
> > - 3 mons
> > Default scrubbing settings.
> > Journal and filestore settings:
> >         journal aio = true
> >         filestore flush min = 0
> >         filestore flusher = false
> >         filestore fiemap = false
> >         filestore op threads = 4
> >         filestore queue max ops = 4096
> >         filestore queue max bytes = 10485760
> >         filestore queue committing max bytes = 10485760
> >         journal max write bytes = 10485760
> >         journal queue max bytes = 10485760
> >         ms dispatch throttle bytes = 10485760
> >         objecter infilght op bytes = 10485760
> > 
> > Is this a known bug in this version?
> > (Do you know some workaround to fix this?)
> > 
> > ---
> > Regards
> > Dominik
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html