Re: How can I use not-replicated pool (replication 1 or raid-0)

mhnx <morphinwithyou@xxxxxxxxx> · Tue, 2 May 2023 18:25:45 +0300

Thank you for the explanation Frank.

I also agree with you, Ceph is not designed for this kind of use case
but I tried to continue what I know.
My idea was exactly what you described, I was trying to automate
cleaning or recreating on any failure.

As you can see below, rep1 pool is very fast:
- Create: time for i in {00001..99999}; do head -c 1K </dev/urandom
>randfile$i; done
replication 2 : 31m59.917s
replication 1 : 7m6.046s
--------------------------------
- Delete: time rm -rf testdir/
replication 2 : 11m56.994s
replication 1 : 0m40.756s
-------------------------------------

I started learning DRBD, I will also check BeeGFS thanks for the advice.

Regards.

Frank Schilder <frans@xxxxxx>, 1 May 2023 Pzt, 10:27 tarihinde şunu yazdı:
>
> I think you misunderstood Janne's reply. The main statement is at the end, ceph is not designed for an "I don't care about data" use case. If you need speed for temporary data where you can sustain data loss, go for something simpler. For example, we use beegfs with great success for a burst buffer for an HPC cluster. It is very lightweight and will pull out all performance your drives can offer. In case of disaster it is easily possible to clean up. Beegfs does not care about lost data, such data will simply become inaccessible while everything else just moves on. It will not try to self-heal either. It doesn't even scrub data, so no competition of users with admin IO.
>
> Its pretty much your use case. We clean it up every 6-8 weeks and if something breaks we just redeploy the whole thing from scratch. Performance is great and its a very simple and economic system to administrate. No need for the whole ceph daemon engine with large RAM requirements and extra admin daemons.
>
> Use ceph for data you want to survive a nuclear blast. Don't use it for things its not made for and then complain.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: mhnx <morphinwithyou@xxxxxxxxx>
> Sent: Saturday, April 29, 2023 5:48 AM
> To: Janne Johansson
> Cc: Ceph Users
> Subject:  Re: How can I use not-replicated pool (replication 1 or raid-0)
>
> Hello Janne, thank you for your response.
>
> I understand your advice and be sure that I've designed too many EC
> pools and I know the mess. This is not an option because I need SPEED.
>
> Please let me tell you, my hardware first to meet the same vision.
> Server: R620
> Cpu: 2 x Xeon E5-2630 v2 @ 2.60GHz
> Ram: 128GB - DDR3
> Disk1: 20x Samsung SSD 860 2TB
> Disk2: 10x Samsung SSD 870 2TB
>
> My ssds does not have PLP. Because of that, every ceph write also
> waits for TRIM. I want to know how much latency we are talking about
> because I'm thinking of adding PLP NVME for wal+db cache to gain some
> speed.
> As you can see, I even try to gain from every TRIM command.
> Currently I'm testing replication 2 pool and even this speed is not
> enough for my use case.
> Now I'm trying to boost the deletion speed because I'm writing and
> deleting files all the time and this never ends.
> I write this mail because replication 1 will decrease the deletion
> speed but still I'm trying to tune some MDS+ODS parameters to increase
> delete speed.
>
> Any help and idea will be great for me. Thanks.
> Regards.
>
>
>
> Janne Johansson <icepic.dz@xxxxxxxxx>, 12 Nis 2023 Çar, 10:10
> tarihinde şunu yazdı:
> >
> > Den mån 10 apr. 2023 kl 22:31 skrev mhnx <morphinwithyou@xxxxxxxxx>:
> > > Hello.
> > > I have a 10 node cluster. I want to create a non-replicated pool
> > > (replication 1) and I want to ask some questions about it:
> > >
> > > Let me tell you my use case:
> > > - I don't care about losing data,
> > > - All of my data is JUNK and these junk files are usually between 1KB to 32MB.
> > > - These files will be deleted in 5 days.
> > > - Writable space and I/O speed is more important.
> > > - I have high Write/Read/Delete operations, minimum 200GB a day.
> >
> > That is "only" 18MB/s which should easily be doable even with
> > repl=2,3,4. or EC. This of course depends on speed of drives, network,
> > cpus and all that, but in itself it doesn't seem too hard to achieve
> > in terms of average speeds. We have EC8+3 rgw backed by some 12-14 OSD
> > hosts with hdd and nvme (for wal+db) that can ingest over 1GB/s if you
> > parallelize the rgw streams, so 18MB/s seems totally doable with 10
> > decent machines. Even with replication.
> >
> > > I'm afraid that, in any failure, I won't be able to access the whole
> > > cluster. Losing data is okay but I have to ignore missing files,
> >
> > Even with repl=1, in case of a failure, the cluster will still aim at
> > fixing itself rather than ignoring currently lost data and moving on,
> > so any solution that involves "forgetting" about lost data would need
> > a ceph operator telling the cluster to ignore all the missing parts
> > and to recreate the broken PGs. This would not be automatic.
> >
> >
> > --
> > May the most significant bit of your life be positive.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx