So it looks like the scrub was not actually the root of the problem. It seems that I have some hardware that is failing that I'm now trying to run down.
QH
On Wed, Jul 29, 2015 at 8:22 PM, Christian Balzer <chibi@xxxxxxx> wrote:
Hello,
On Wed, 29 Jul 2015 17:59:10 -0600 Quentin Hartman wrote:
> well, that would certainly do it. I _always_ forget to twiddle the little
> thing on the web page that changes the version of the docs I'm looking
> at.
>
> So I guess then my question becomes, "How do i prevent deep scrubs from
> happening in the middle of the day and ruining everything?"
>
Firstly a qualification and quantification of "ruining everything" would
be interesting, but I'll assume it's bad.
I have (had) clusters where even simple scrubs would be detrimental, so I
can relate.
That being said, if your cluster goes catatonic when being scrubbed, you
might want to improve it (more, faster OSDs, etc) because a deep scrub
isn't all that different from the load you'll experience when loosing an
OSD or node even, something your cluster should survive w/o becoming
totally unusable in regards to client I/O.
The most effective way to keep scrubs from starving client
I/O is setting "osd_scrub_sleep = 0.1" (the recommended value in
documentation seems to be far too small to have any beneficial effect for most people).
To scrub at a specific time and given that your cluster can deep- scrub
itself completely during the night, consider issuing a
"ceph osd deep-scrub \*"
late on a weekend evening.
My largest cluster can deep scrub itself in 4 hours, so once I kicked that
off at midnight on a Saturday all scrubs (daily) and deep scrubs
(weekly) happen in that time frame.
Christian
> QH
>
>
> On Wed, Jul 29, 2015 at 5:55 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
>
> > Hi Quentin,
> >
> > It may be the specific option you are trying to tweak.
> > osd-scrub-begin-hour was first introduced in development release
> > v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
> > 0.87.1 (Giant).
> >
> > Cheers,
> >
> > - Travis
> >
> > On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
> > > ceph-users@xxxxxxxxxxxxxx> > <qhartman@xxxxxxxxxxxxxxxxxxx> wrote:
> > > I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be
> > > working:
> > >
> > > # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
> > > failed to parse arguments: --osd-scrub-begin-hour,1
> > >
> > >
> > > I've also tried the daemon config set variant and it also fails:
> > >
> > > # ceph daemon osd.0 config set osd_scrub_begin_hour 1
> > > { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such
> > file or
> > > directory"}
> > >
> > > I'm guessing I have something goofed in my admin socket client
> > > config:
> > >
> > > [client]
> > > rbd cache = true
> > > rbd cache writethrough until flush = true
> > > admin socket = /var/run/ceph/$cluster-$type.$id.asok
> > >
> > > but that seems to correlate with the structure that exists:
> > >
> > > # ls
> > > ceph-osd.24.asok ceph-osd.25.asok ceph-osd.26.asok
> > > # pwd
> > > /var/run/ceph
> > >
> > > I can show my configs all over the place, but changing them seems to
> > always
> > > fail. It behaves the same if I'm working on a local daemon, or on my
> > config
> > > node trying to make changes globally.
> > >
> > > Thanks in advance for any ideas
> > >
> > > QH
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com