Re: Ceph flash deployment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I see in this thread that someone is saying that bluestore is only works
good with cfq scheduler:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031063.html

For readahead, do you have any measurements to see how I can measure my
workload to see if I should increase it or not?

Thanks.

On Wed, Nov 4, 2020 at 8:00 AM Alexander E. Patrakov <patrakov@xxxxxxxxx>
wrote:

> With the latest kernel, this is not valid for all-flash clusters.
> Simply because cfq is not an option at all there, and readahead
> usefulness depends on your workload (in other words, it can help or
> hurt) and therefore cannot be included in a universally-applicable set
> of tuning recommendations. Also, look again: the title talks about
> all-flash deployments, while the context of the benchmark talks about
> 7200RPM HDDs!
>
> On Wed, Nov 4, 2020 at 12:37 AM Seena Fallah <seenafallah@xxxxxxxxx>
> wrote:
> >
> > Thanks for your useful information.
> >
> > Can you please also point to the kernel and disk configuration that are
> still valid for bluestore or not? I mean the read_ahead_kb and disk
> scheduler.
> >
> > Thanks.
> >
> > On Tue, Nov 3, 2020 at 10:55 PM Alexander E. Patrakov <
> patrakov@xxxxxxxxx> wrote:
> >>
> >> On Tue, Nov 3, 2020 at 6:30 AM Seena Fallah <seenafallah@xxxxxxxxx>
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > Does this guid is still valid for a bluestore deployment with
> nautilus or
> >> > octopus?
> >> >
> https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments
> >>
> >> Some of the guidance is of course outdated.
> >>
> >> E.g., at the time of that writing, 1x 40GbE was indeed state of the
> >> art in the networking world, but now 100GbE network cards are
> >> affordable, and with 6 NVMe drives per server, even that might be a
> >> bottleneck if the clients use a large block size (>64KB) and do an
> >> fsync() only at the end.
> >>
> >> Regarding NUMA tuning, Ceph made some progress. If it finds that your
> >> NVMe and your network card are on the same NUMA node, then, with
> >> Nautilus or later, the OSD will pin itself to that NUMA node
> >> automatically. I.e.: choose strategically which PCIe slots to use,
> >> maybe use two network cards, and you will not have to do any tuning or
> >> manual pinning.
> >>
> >> Partitioning the NVMe was also a popular advice in the past, but now
> >> that there are "osd op num shards" and "osd op num threads per shard"
> >> parameters, with sensible default values, this is something that tends
> >> not to help.
> >>
> >> Filesystem considerations in that document obviously apply only to
> >> Filestore, which is something you should not use.
> >>
> >> Large PG number per OSD helps more uniform data distribution, but
> >> actually hurts performance a little bit.
> >>
> >> The advice regarding the "performance" cpufreq governor is valid, but
> >> you might also look at (i.e. benchmark for your workload specifically)
> >> disabling the deepest idle states.
> >>
> >> --
> >> Alexander E. Patrakov
> >> CV: http://pc.cd/PLz7
>
>
>
> --
> Alexander E. Patrakov
> CV: http://pc.cd/PLz7
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux