Re: dense storage nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 18 mei 2016 om 7:54 schreef Blair Bethwaite <blair.bethwaite@xxxxxxxxx>:
> 
> 
> Hi all,
> 
> What are the densest node configs out there, and what are your
> experiences with them and tuning required to make them work? If we can
> gather enough info here then I'll volunteer to propose some upstream
> docs covering this.
> 
> At Monash we currently have some 32-OSD nodes (running RHEL7), though
> 8 of those OSDs are not storing or doing much yet (in a quiet EC'd RGW
> pool), the other 24 OSDs are serving RBD and at perhaps 65% full on
> average - these are 4TB drives.
> 

I worked on a 256 OSD per node cluster (~2500 OSDs in total) and that didn't work out as hoped.

I got into this project when the hardware was already ordered, it wouldn't have been my choice.

> Aside from the already documented pid_max increases that are typically
> necessary just to start all OSDs, we've also had to up
> nf_conntrack_max. We've hit issues (twice now) that seem (have not

Why enable connection tracking at all? It only slows down Ceph traffic.

> figured out exactly how to confirm this yet) to be related to kernel
> dentry slab cache exhaustion - symptoms were a major slow down in
> performance and slow requests all over the place on writes, watching
> OSD iostat would show a single drive hitting 90+% util for ~15s with a
> bunch of small reads and no writes. These issues were worked around by
> tuning up filestore split and merge thresholds, though if we'd known
> about this earlier we'd probably have just bumped up the default
> object size so that we simply had fewer objects (and/or rounded up the
> PG count to the next power of 2). We also set vfs_cache_pressure to 1,
> though this didn't really seem to do much at the time. I've also seen
> recommendations about setting min_free_kbytes to something higher
> (currently 90112 on our hardware) but have not verified this.
> 

I eventually ended up doing NUMA pinning of OSDs and increasing pid_max, but that were most of the values.

The network didn't really need so much attention to make this work.

Wido

> -- 
> Cheers,
> ~Blairo
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux