Re: Question: CephFS + Bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 9, 2018 at 3:32 PM, Webert de Souza Lima
<webert.boss@xxxxxxxxx> wrote:
> Hello,
>
> Currently, I run Jewel + Filestore for cephfs, with SSD-only pools used for
> cephfs-metadata, and HDD-only pools for cephfs-data. The current
> metadata/data ratio is something like 0,25% (50GB metadata for 20TB data).
>
> Regarding bluestore architecture, assuming I have:
>
>  - SSDs for WAL+DB
>  - Spinning Disks for bluestore data.
>
> would you recommend still store metadata in SSD-Only OSD nodes?

It depends on the metadata intensity of your workload.  It might be
quite interesting to gather some drive stats on how many IOPS are
currently hitting your metadata pool over a week of normal activity.

The primary reason for using SSDs for metadata is the cost-per-IOP.
SSDs are generally cheaper per operation than HDDs, so if you've got
enough IOPS to occupy an SSD then it's a no-brainer cost saving to use
SSDs (performance benefits are just a bonus).

If you are doing large file workloads, and the metadata mostly fits in
RAM, then the number of IOPS from the MDS can be very, very low.  On
the other hand, if you're doing random metadata reads from a small
file workload where the metadata does not fit in RAM, almost every
client read could generate a read operation, and each MDS could easily
generate thousands of ops per second.

> If not, is it recommended to dedicate some OSDs (Spindle+SSD for WAL/DB) for
> cephfs-metadata?

Isolating metadata OSDs is useful if the data OSDs are going to be
completely saturated: metadata performance will be protected even if
clients are hitting the data OSDs hard.

However, if your OSDs outnumber clients such that the clients couldn't
possibly saturate the OSDs, then you don't have this issue.

> If I just have 2 pools (metadata and data) all sharing the same OSDs in the
> cluster, would it be enough for heavy-write cases?

If "heavy write" means completely saturating the cluster, then sharing
the OSDs is risky.  If "heavy write" just means that there are more
writes than reads, then it may be fine if the metadata workload is not
heavy enough to make good use of SSDs.

The way I'd summarise this is: in the general case, dedicated SSDs are
the safe way to go -- they're intrinsically better suited to metadata.
However, in some quite common special cases, the overall number of
metadata ops is so low that the device doesn't matter.

John

> Assuming min_size=2, size=3.
>
> Thanks for your thoughts.
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> IRC NICK - WebertRLZ
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux