Re: Future of Filestore?

Stuart Longland <stuartl@xxxxxxxxxxxxxxxxxx> · Mon, 22 Jul 2019 19:48:22 +1000

On 22/7/19 7:13 pm, Marc Roos wrote:
> 
>  >> Reverting back to filestore is quite a lot of work and time again. 
>  >> Maybe see first if with some tuning of the vms you can get better 
> results?
>  >
>  >None of the VMs are particularly disk-intensive.  There's two users 
> accessing the system over a WiFi network for email, and some HTTP/SMTP 
> traffic coming in via an ADSL2 Internet connection.
>  >
>  >If Bluestore can't manage this, then I'd consider it totally worthless 
> in any enterprise installation -- so clearly something is wrong.
> 
> I have a cluster mainly intended for backups to cephfs, 4 nodes, sata 
> disks and mostly 5400rpm. Because the cluster is doing nothing. I 
> decided to put vm's on them. I am running 15 vm's without problems on 
> the hdd pool. Going to move more to them. One of them is an macos 
> machine, I did once a fio test in it and gave me 917 iops at 4k random 
> reads. (technically not possible I would say, I have mostly default 
> configurations in libvirt)

Well, that is promising.

I did some measurements of the raw disk performance, I get about
30MB/sec according to `hdparm`, so whilst this isn't going to set the
world on fire, it's "decent" for my needs.

The only thing I can think of is the fact that `hdparm` does a
sequential read, whereas BlueStore operation would be more "random", so
seek times come into play.

I've now migrated two of my nodes to FileStore/XFS, with the journal
on-disk (it won't let me move it to the SSD like I did last time oddly
enough), and I'm seeing less I/O issues now although things are still
slow (3 nodes are still on BlueStore).

I think the fact that my nodes have plenty of RAM between them (>8GB,
one with 32GB) helps here.

The BlueStore settings are at their defaults, which means it should be
tuning the cache size used for BlueStore … maybe this isn't working as
it should on a cluster as small as this.

>  >
>  >> What you also can try is for io intensive vm's add an ssd pool?
>  >
>  >How well does that work in a cluster with 0 SSD-based OSDs?
>  >
>  >For 3 of the nodes, the cases I'm using for the servers can fit two 
> 2.5"
>  >drives.  I have one 120GB SSD for the OS, that leaves one space spare 
> for the OSD.  
> 
> 
> I think this could be your bottle neck, I have 31 drives, so the load is 
> spread across 31 (hopefully). If you have only 3 drives you have 
> 3x60iops to share amongst your vms. 
> I am getting the impression that ceph development is not really 
> interested in setups quite different from the advised standards. I once 
> made an attempt to get things better working for 1Gb adapters[0].

Yeah, unfortunately I'll never be able to cram 31 drives into this
cluster.  I am considering how I might add more, and right now the
immediate thought is to use m.2 SATA SSDs in USB 3 cases.

This gives me something a little bigger than a thumb-drive that is
bus-powered and external to the case, so I don't have the thermal and
space issues of mounting a HDD in there: they're small and light-weight
so they can just dangle from the supplied USB3 cable.

I'll have to do some research though on how mixing SSDs and HDDs would
work.  I need more space than SSDs alone can provide in a cost-effective
manner so going SSD only just isn't an option here, but if I can put
them into the same pool with the HDDs and have them act as a "cache" for
the more commonly read/written objects, that could help.

In this topology though, I my only be using 256GB or 512GB SSDs, so much
less storage on SSDs than the HDDs which likely won't work that well for
tiering (https://ceph.com/planet/ceph-hybrid-storage-tiers/).  So it'll
need some planning and home-work. :-)

FileStore/XFS looks to be improving the situation just a little, so if I
have to hold back on that for a bit, that's fine.  It'll give me time to
work on the next step.
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com