Re: requests are blocked - problem

Christian Balzer <chibi@xxxxxxx> · Thu, 20 Aug 2015 10:07:31 +0900

Hello,

On Wed, 19 Aug 2015 15:27:29 +0200 Jacek Jarosiewicz wrote:

> Hi,
> 
> On 08/19/2015 11:01 AM, Christian Balzer wrote:
> >
> > Hello,
> >
> > That's a pretty small cluster all things considered, so your rather
> > intensive test setup is likely to run into any or all of the following
> > issues:
> >
> > 1) The amount of data you're moving around is going cause a lot of
> > promotions from and to the cache tier. This is expensive and slow.
> > 2) EC coded pools are slow. You may have actually better results with a
> > "Ceph classic" approach, 2-4 HDDs per journal SSD. Also 6TB HDDs
> > combined with EC may look nice to you from a cost/density prospect,
> > but more HDDs means more IOPS and thus speed.
> > 3) scrubbing (unless configured very aggressively down) will
> > impact your performance on top of the items above.
> > 4) You already noted the kernel versus userland bit.
> > 5) Having all your storage in a single JBOD chassis strikes me as ill
> > advised, though I don't think it's an actual bottleneck at 4x12Gb/s.
> >
> 
> We use two of these (I forgot to mention that)
> Each chasis has two internal controllers - both exposing all the disks 
> to the connected hosts. There are two osd nodes connected to each chasis.
>
Ah, so you have the dual controller version.

> > When you ran the fio tests I assume nothing else was going on and the
> > dataset size would have fit easily into the cache pool, right?
> >
> > Look at your nodes with atop or iostat, I venture all your HDDs are at
> > 100%.
> >
> > Christian
> >
> 
> Yes, the problem was a full cache pool. I'm currently wondering on how 
> to tune the cache pool parameters so that the whole cluster doesn't slow 
> down that much when the cache is full...

Nick already gave you some advice on this, however with the current
versions of Ceph cache tiering is simply expensive and slow.

> I'm thinking of doing some tests on a pool w/o the cache tier so I can 
> compare the results. Any suggestions would be greatly appreciated..
> 
For a realistic comparison with your current setup, a total rebuild would
be in order. Provided your cluster is testing only at this point.

Given your current HW, that means the same 2-3 HDDs per storage node and 1
SSD as journal.

What exact maker/model are your SSDs?

Again, more HDDs means more (sustainable) IOPS, so unless your space
requirements (data and physical) are very demanding, double the amount of
3TB HDDs would be noticeably better. 

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com