Re: Suggestion to build ceph storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank Jake

On Mon, Jun 20, 2022 at 10:47 AM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx>
wrote:

> Hi Stefan
>
> We use cephfs for our 7200CPU/224GPU HPC cluster, for our use-case
> (large-ish image files) it works well.
>
> We have 36 ceph nodes, each with 12 x 12TB HDD, 2 x 1.92TB NVMe, plus a
> 240GB System disk. Four dedicated nodes have NVMe for metadata pool, and
> provide mon,mgr and MDS service.
>

This is great info, Assuming we don't need redundancy for NvMe because if
it fails then it will impact only on 6 OSDs and that is acceptable. At
present, because of limited HW supply i am planning to host MDS on the same
OSDs nodes (not dedicated HW for MDS) agreed that not a best practice but
again currently i am dealing with all unknown and i don't want to throw
money on something which we don't know. I may have more data as we start
using and then I can adjust requirements accordingly.


>
> I'm not sure you need 4% of OSD for wal/db, search this mailing list
> archive for a definitive answer, but my personal notes are as follows:
>
> "If you expect lots of small files: go for a DB that's > ~300 GB
> For mostly large files you are probably fine with a 60 GB DB.
> 266 GB is the same as 60 GB, due to the way the cache multiplies at each
> level, spills over during compaction."
>

We don't know what kind of workload we are going to run because currently
all they ask for is large storage with many many drives. In future if they
ask for more IOPs then we may replace some box with NvME or SSD and adjust
requirements.


>
> We use a single enterprise quality 1.9TB NVMe for each 6 OSDs to good
> effect, you probably need 1DWPD to be safe. I suspect you might be able
> to increase the ratio of HDD per NVMe with PCIe gen4 NVMe drives.
>
>
Can you share what company NvME drives are you using?


> best regards,
>
> Jake
>
> On 20/06/2022 08:22, Stefan Kooman wrote:
> > On 6/19/22 23:23, Christian Wuerdig wrote:
> >> On Sun, 19 Jun 2022 at 02:29, Satish Patel <satish.txt@xxxxxxxxx>
> wrote:
> >>
> >>> Greeting folks,
> >>>
> >>> We are planning to build Ceph storage for mostly cephFS for HPC
> workload
> >>> and in future we are planning to expand to S3 style but that is yet
> >>> to be
> >>> decided. Because we need mass storage, we bought the following HW.
> >>>
> >>> 15 Total servers and each server has a 12x18TB HDD (spinning disk) . We
> >>> understand SSD/NvME would be best fit but it's way out of budget.
> >>>
> >>> I hope you have extra HW on hand for Monitor and MDS  servers
> >
> > ^^ this. It also depends on the uptime guarantees you have to provide
> > (if any). Are the HPC users going to write large files? Or loads of
> > small files? The more metadata operations the busier the MDSes will be,
> > but if it's mainly large files the load on them will be much lower.
> >>
> >>> Ceph recommends using a faster disk for wal/db if the data disk is
> >>> slow and
> >>> in my case I do have a slower disk for data.
> >>>
> >>> Question:
> >>> 1. Let's say if i want to put a NvME disk for wal/db then what size i
> >>> should buy.
> >>>
> >>
> >> The official recommendation is to budget 4% of OSD size for WAL/DB -
> >> so in
> >> your case that would be 720GB per OSD. Especially if you want to go to
> S3
> >> later you should stick closer to that limit since RGW is a heavy meta
> >> data
> >> user.
> >
> > CephFS can be metadata heavy also, depending on work load. You can
> > co-locate the S3 service on this cluster later on, but from an
> > operational perspective this might not be preferred: you can tune the
> > hardware / configuration for each use case. Easier to troubleshoot,
> > independent upgrade cycle, etc.
> >
> > Gr. Stefan
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> For help, read https://www.mrc-lmb.cam.ac.uk/scicomp/
> then contact unixadmin@xxxxxxxxxxxxxxxxx
> --
> Dr Jake Grimmett
> Head Of Scientific Computing
> MRC Laboratory of Molecular Biology
> Francis Crick Avenue,
> Cambridge CB2 0QH, UK.
> Phone 01223 267019
> Mobile 0776 9886539
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux