Re: cosd multi-second stalls cause "wrongly marked me down"

"Jim Schutt" <jaschut@xxxxxxxxxx> · Mon, 11 Apr 2011 15:18:35 -0600

Jim Schutt wrote:
Sage Weil wrote:
On Fri, 8 Apr 2011, Jim Schutt wrote:

So, in the short term I guess I need to run fewer cosd
instances per server.

There is one other thing to look at, and that's the number of threads 
used by each cosd process.  Have you tried setting

    osd op threads = 1

(or even 0, although I haven't tested that recently).  That will limit 
the number of concurrent IOs in flight to the fs.  Setting it to 0 
will avoid using a thread pool at all and will process the IO in the 
message dispatch thread (though we haven't tested that recently so 
there may be issues).

I'll try this 2nd, since it's easy.

     osd op threads = 0

didn't work for me at all - 20 of 96 OSDs aborted almost
immediately after startup.

     osd op threads = 1

didn't work very well either - one of my servers went OOM,
which hasn't happened since I started using my restricted
buffering parameters.

It really does seem like I'm just trying to do too much
work on each server.  If I back off to 4 OSDs/server on
my  hardware, there's a few percent idle cycles, making
interacting with it much more pleasant.

-- Jim

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html