Re: Questions on Ceph cluster without OS disks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
If you want to have a swap, why not create a ramdisk and then format/use 
it as swap?


-----Original Message-----
From: Brent Kennedy [mailto:bkennedy@xxxxxxxxxx] 
Sent: 05 April 2020 20:13
To: 'Martin Verges'
Cc: 'ceph-users'
Subject:  Re: Questions on Ceph cluster without OS disks

I agree with the sentiment regarding swap, however it seems the OS devs 
still suggest having a swap, even if its small.  We monitor swap file 
usage and there is none in the ceph clusters, I am mainly looking at 
eliminating it(assuming its “safe” to do so), but don’t want to risk 
production machines just to save some OS space on disk.  However, the 
idea of loading the OS into memory is very interesting to me, at least 
in the instance of a production environment.  Not that it’s a new 
thing, more so in the use case of ceph clusters.  We already run all the 
command and control on VMs, so running the OSD host server OS’s in 
memory seems like a nifty idea to allow us to fully use every disk bay.  
We have some older 620s that use an SD card on mirror( which is not 
super reliable in practice ), they might be good candidates for this.  I 
am just wondering how we would drop in the correct ceph configuration 
files during boot without needing to do tons of scripting ( the clusters 
are 15-20 machines ).

 

-Brent

 

From: Martin Verges <martin.verges@xxxxxxxx>
Sent: Sunday, April 5, 2020 3:04 AM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>
Cc: huxiaoyu@xxxxxxxxxxxx; ceph-users <ceph-users@xxxxxxx>
Subject: Re:  Re: Questions on Ceph cluster without OS disks

 

Hello Brent,

 

no, swap is definitely not needed if you configure systems correctly.

Swap in Ceph kills all your performance and brings a lot of harm to 
clusters. It increases the downtime, decreases the performance and can 
result in much longer recovery times which endangers your data.

 

In the very old times, swap was required as you were unable to have 
enough memory in your systems. Today's server does not require a swap 
partition and I personally disable it on all my systems in the past 
>10y. As my last company was a datacenter provider with multiple 
thousand systems, I believe to have quite some insights if that is 
stable.

 

What happens if you run out of memory you might ask? - simple, OOM 
killer kills one process and systemd restarts it, service is back up in 
a few seconds.

Can you choose what process is killed most likely? - yes you can. Take a 
look into /proc/*/oom_adj

What happens if I swap gets filled up? - total destruction ;), your OOM 
killer kills one process, freeing up swap takes a much longer time, 
system load skyrocks, services become unresponsive, Ceph client IO can 
drop to near zero... just save yourself the trouble.

 

So yes, we strongly believe to have a far superior system by design by 
just preventing swap at all.




--

Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx <mailto:martin.verges@xxxxxxxx>
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht 
Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

 

 

Am So., 5. Apr. 2020 um 01:59 Uhr schrieb Brent Kennedy 
<bkennedy@xxxxxxxxxx <mailto:bkennedy@xxxxxxxxxx> >:

Forgive me for asking but it seems most OS's require a swap file and 
when I look into doing something similar(meaning not having anything), 
they all say the OS could go unstable without it.  It seems that anyone 
doing this needs to be 100 certain memory will not be used at 100% ever 
or the OS would crash if no swap was there.  How are you getting around 
this and has it ever been a thing?

Also, for the ceph OSDs, where are you storing the osd and host 
configurations ( central storage? )?

Regards,
-Brent

Existing Clusters:
Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi 
gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 
with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK 
Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways 
US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 
gateways, 2 iscsi gateways




-----Original Message-----
From: Martin Verges <martin.verges@xxxxxxxx 
<mailto:martin.verges@xxxxxxxx> >
Sent: Sunday, March 22, 2020 3:50 PM
To: huxiaoyu@xxxxxxxxxxxx <mailto:huxiaoyu@xxxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> >
Subject:  Re: Questions on Ceph cluster without OS disks

Hello Samuel,

we from croit.io 
<http://croit.io>  don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it.
This way we can save the slots of OS harddisks to add more storage per 
node and reduce overall costs as 1GB ram is cheaper then an OS disk and 
consumes less power.

If our management node is down, nothing will happen to the cluster. No 
impact, no downtime. However, you do need the mgmt node to boot up the 
cluster. So after a very rare total power outage, your first system 
would be the mgmt node and then the cluster itself. But again, if you 
configure your systems correct, no manual work is required to recover 
from that. For everything else, it is possible (but definitely not 
needed) to deploy our mgmt node in active/passive HA.

We have multiple hundred installations worldwide in production 
environments. Our strong PXE knowledge comes from more than 20 years of 
datacenter hosting experience and it never ever failed us in the last 
>10 years.

The main benefits out of that:
 - Immutable OS freshly booted: Every host has exactly the same version, 
same library, kernel, Ceph versions,...
 - OS is heavily tested by us: Every croit deployment has exactly the 
same image. We can find errors much faster and hit much fewer errors.
 - Easy Update: Updating OS, Ceph or anything else is just a node 
reboot.
No cluster downtime, No service Impact, full automatic handling by our 
mgmt Software.
 - No need to install OS: No maintenance costs, no labor required, no 
other OS management required.
 - Centralized Logs/Stats: As it is booted in memory, all logs and 
statistics are collected on a central place for easy access.
 - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all 
boot the exact same image in a few seconds.
 .. lots more

Please do not hesitate to contact us directly. We always try to offer an 
excellent service and are strongly customer oriented.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx <mailto:martin.verges@xxxxxxxx>
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht 
Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu@xxxxxxxxxxxx 
<mailto:huxiaoyu@xxxxxxxxxxxx>  < huxiaoyu@xxxxxxxxxxxx 
<mailto:huxiaoyu@xxxxxxxxxxxx> >:

> Hello, Martin,
>
> I notice that Croit advocate the use of ceph cluster without OS disks, 

> but with PXE boot.
>
> Do you use a NFS server to serve the root file system for each node? 
> such as hosting configuration files, user and password, log files, 
> etc. My question is, will the NFS server be a single point of failure?
> If the NFS server goes down, the network experience any outage, ceph 
> nodes may not be able to write to the local file systems, possibly 
leading to service outage.
>
> How do you deal with the above potential issues in production? I am a 
> bit worried...
>
> best regards,
>
> samuel
>
>
>
>
> ------------------------------
> huxiaoyu@xxxxxxxxxxxx <mailto:huxiaoyu@xxxxxxxxxxxx>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx 
<mailto:ceph-users@xxxxxxx>  To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux