Re: Can we deprecate FileStore in Quincy?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 26 Jun 2021 10:06:10 -0700
Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:

> A handful of years back WD Labs did their “microserver” project, a
> cluster of 504 drives with an onboard ARM CPU and 1GB of RAM, 8TB
> HDDs I think.  But yeah that most likely was Filestore.
> 
> At a Ceph Day in Hillsboro someone, forgive me for not remembering
> who, spoke of running production on servers with 2GB RAM per OSD.  He
> said that it was painful, required a lot of work, and would not
> recommend it.  ymmv.

Yeah, I wouldn't want to go below 4GB RAM.

> >> - In my experience, it performs poorly on HDD-based clusters with a
> >>   small number of disks  
> 
> Don’t HDD clusters with a small number of disks *always* perform poorly?

Originally when I deployed my 3-node cluster, I was getting comparable
performance to Microsoft Azure's cheaper offerings.  (Not a glowing
endorcement of their cloud I might add, but it was quite acceptable.)

> >>  Also, only one Ethernet port.  
> 
> Worse yet they have *zero* HIPPI ports! Can you imagine!?

Never used HIPPI.

A 48-port gigabit managed switch is reasonably accessible to the home
gamer, both in terms of availability and cost.

Second-hand 10GbE switches can be found for reasonable prices, but a
new one is pricey!  Too expensive for my liking.

> >>   - Intel NUCs and similar machines can do Ceph work, but only one
> >>     Ethernet port is a limitation.  
> 
> Why the fixation on multiple network interfaces?

… because Ceph needs one interface for the "public" network and one for
the "private" network?  Plus, 802.3AD helps.

> >>  (Plus the need to use a console
> >>     to manage them instead of using a BMC with a server board or
> >>     a multiplexed serial console is a nuisance.)  
> 
> Not all of us using Ceph are big corporates with deep pockets.  BMCs have an incremental cost.

Truth be told, I'd like to ditch the BMCs, but most BIOSes have a
fixation of needing a monitor and keyboard to configure them.  CoreBoot
has the right idea, but isn't widely available on kit accessible to the
home experimenter.

> >> Not all of us using Ceph are big corporates with deep pockets.  
> 
> I’ve heard that!
> 
> In all seriousness, these aren’t limitations for a PoC cluster, but
> then functional PoCs don’t need BMCs and are easy to deploy on VMs.
> For production I wouldn’t think that there would be a lot of good
> use-cases for a small number of SBC nodes — and that some sort of
> RAID solution is often a better fit.  There’s also lots of used gear
> available.  For small scale clusters with modest performance needs,
> this should be a viable alternative.  I’ve seen any number of folks
> in that situation.  Donated / abandoned / repurposed hardware.

Well, my use case is a small-scale cluster in a SOHO-type environment.
It started out as a project at my workplace to investigate how to set
up a small private cloud arrangement, then I replicated the set-up at
home to better explore the options with a view of applying what I had
learned to the cluster at my workplace.

So, not production in the sense a business runs on it, but my mail
server and numerous other workloads do run from this cluster.

A nice feature over a RAID system is that I can bring one node down for
maintenance, and still be "online", albeit with degraded performance.

> > FWIW, you can lower both the osd_memory_target and tweak a couple
> > of other settings that will lower bluestore memory usage.  A 2GB
> > target is about the lowest you can reasonably set it to (and you'll
> > likely hurt performance due to cache misses),   
> 
> Indeed, though assuming that we’re talking small clusters with small
> drives, one can set the OSD max low to reduce map size, various
> related tunings, provision a small number of PGs, etc, which I would
> think would help.  Blacklist unneeded kernel modules?  Disable
> nf_conntrack with extreme prejudice?
> 
> > but saying you need a host with 8+GB of RAM is probably a little
> > excessive.  
> 
> Especially for a single OSD.

In this case, 3 of my nodes are running two OSDs: Samsung SSD 860 2TB
(Bluestore) and WDC WD20SPZX-00U (Filestore).  Built on these boards:
https://www.supermicro.com/products/motherboard/atom/X10/A1SAi-2750F.cfm
and mounted up in a DIN-rail mounted case.  (Presently with OS running
off a USB-3.0 external drive.)

I added two more to give me breathing room when re-deploying nodes (in
particular, going from Filestore on BTRFS to Bluestore, then back to
Filestore on XFS), these just have one WDC WD20SPZX-00U each (also
Filestore).  These were built on Intel NUCs because I needed them in a
hurry and I had some DDR4 SO-DIMMs that I bought by mistake.

So that's 5 WDC WD20SPZX-00U OSDs and 3 Samsung SSD 860 OSDs.

I'm looking to move out of the DIN-rail cases as it looks like I'm
out-growing them, so maybe in the future I might replace these with
3.5" drives, but right now this is what I have.

Filestore may not set the world on fire, and may be worse off in bigger
deployments, but it works in the smaller ones really well from what
I've seen.
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux