Re: injectargs not working?

Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> · Wed, 29 Jul 2015 20:46:37 -0600

So it looks like the scrub was not actually the root of the problem. It seems that I have some hardware that is failing that I'm now trying to run down.
QH

On Wed, Jul 29, 2015 at 8:22 PM, Christian Balzer <chibi@xxxxxxx> wrote:

Hello,

On Wed, 29 Jul 2015 17:59:10 -0600 Quentin Hartman wrote:

> well, that would certainly do it. I _always_ forget to twiddle the little

> thing on the web page that changes the version of the docs I'm looking

> at.

>

> So I guess then my question becomes, "How do i prevent deep scrubs from

> happening in the middle of the day and ruining everything?"

>

Firstly a qualification and quantification of "ruining everything" would

be interesting, but I'll assume it's bad.

I have (had) clusters where even simple scrubs would be detrimental, so I

can relate.

That being said, if your cluster goes catatonic when being scrubbed, you

might want to improve it (more, faster OSDs, etc) because a deep scrub

isn't all that different from the load you'll experience when loosing an

OSD or node even, something your cluster should survive w/o becoming

totally unusable in regards to client I/O.

The most effective way to keep scrubs from starving client

I/O is setting "osd_scrub_sleep = 0.1" (the recommended value in

documentation seems to be far too small to have any beneficial effect for most people).

To scrub at a specific time and given that your cluster can deep- scrub

itself completely during the night, consider issuing a

"ceph osd deep-scrub \*"

late on a weekend evening.

My largest cluster can deep scrub itself in 4 hours, so once I kicked that

off at midnight on a Saturday all scrubs (daily) and deep scrubs

(weekly) happen in that time frame.

Christian

> QH

>

>

> On Wed, Jul 29, 2015 at 5:55 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:

>

> > Hi Quentin,

> >

> > It may be the specific option you are trying to tweak.

> > osd-scrub-begin-hour was first introduced in development release

> > v0.93, which means it would be in 0.94.x (Hammer), but your cluster is

> > 0.87.1 (Giant).

> >

> > Cheers,

> >

> >  - Travis

> >

> > On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman

> > <qhartman@xxxxxxxxxxxxxxxxxxx> wrote:

> > > I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be

> > > working:

> > >

> > > # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'

> > >  failed to parse arguments: --osd-scrub-begin-hour,1

> > >

> > >

> > > I've also tried the daemon config set variant and it also fails:

> > >

> > > # ceph daemon osd.0 config set osd_scrub_begin_hour 1

> > > { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such

> > file or

> > > directory"}

> > >

> > > I'm guessing I have something goofed in my admin socket client

> > > config:

> > >

> > > [client]

> > > rbd cache = true

> > > rbd cache writethrough until flush = true

> > > admin socket = /var/run/ceph/$cluster-$type.$id.asok

> > >

> > > but that seems to correlate with the structure that exists:

> > >

> > > # ls

> > > ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok

> > > # pwd

> > > /var/run/ceph

> > >

> > > I can show my configs all over the place, but changing them seems to

> > always

> > > fail. It behaves the same if I'm working on a local daemon, or on my

> > config

> > > node trying to make changes globally.

> > >

> > > Thanks in advance for any ideas

> > >

> > > QH

> > >

> > >

> > > _______________________________________________

> > > ceph-users mailing list

> > > ceph-users@xxxxxxxxxxxxxx

> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> > >

> >

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Fusion Communications

http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com