Re: IMPORTANT : NEED HELP : Low IOPS on hdd : MAX AVAIL Draining fast

Erik McCormick <emccormick@xxxxxxxxxxxxxxx> · Sat, 27 Apr 2019 20:35:13 -0400

On Sat, Apr 27, 2019, 3:49 PM Nikhil R <nikh.ravindra@xxxxxxxxx> wrote:
We have baremetal nodes 256GB RAM, 36core CPU 
We are on ceph jewel 10.2.9 with leveldb
The osd’s and journals are on the same hdd.
We have 1 backfill_max_active, 1 recovery_max_active and 1 recovery_op_priority
The osd crashes and starts once a pg is backfilled and the next pg tried to backfill. This is when we see iostat and the disk is utilised upto 100%.

I would set noout to prevent excess movement in the event of OSD flapping, and disable scrubbing and deep scrubbing until your backfilling has completed. I would also bring the new OSDs online a few at a time rather than all 25 at once if you add more servers. 

Appreciate your help David 

On Sun, 28 Apr 2019 at 00:46, David C <dcsysengineer@xxxxxxxxx> wrote:

On Sat, 27 Apr 2019, 18:50 Nikhil R, <nikh.ravindra@xxxxxxxxx> wrote:
Guys,
We now have a total of 105 osd’s on 5 baremetal nodes each hosting 21 osd’s on HDD which are 7Tb with journals on HDD too. Each journal is about 5GB

This would imply you've got a separate hdd partition for journals, I don't think there's any value in that and would probabaly be detrimental to performance. 

We expanded our cluster last week and added 1 more node with 21 HDD and journals on same disk. 
Our client i/o is too heavy and we are not able to backfill even 1 thread during peak hours - incase we backfill during peak hours osd's are crashing causing undersized pg's and if we have another osd crash we wont be able to use our cluster due to undersized and recovery pg's. During non-peak we can just backfill 8-10 pgs.
Due to this our MAX AVAIL is draining out very fast.

How much ram have you got in your nodes? In my experience that's a common reason for crashing OSDs during recovery ops

What does your recovery and backfill tuning look like? 

We are thinking of adding 2 more baremetal nodes with 21 *7tb  osd’s on  HDD and add 50GB SSD Journals for these.
We aim to backfill from the 105 osd’s a bit faster and expect writes of backfillis coming to these osd’s faster.

Ssd journals would certainly help, just be sure it's a model that performs well with Ceph 

Is this a good viable idea?
Thoughts please?

I'd recommend sharing more detail e.g full spec of the nodes, Ceph version etc. 

-Nikhil
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Sent from my iPhone
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com