Re: CephFS in the wild

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 1, 2016 at 1:50 PM, Brady Deetz <bdeetz@xxxxxxxxx> wrote:
> Question:
> I'm curious if there is anybody else out there running CephFS at the scale
> I'm planning for. I'd like to know some of the issues you didn't expect that
> I should be looking out for. I'd also like to simply see when CephFS hasn't
> worked out and why. Basically, give me your war stories.
>
>
> Problem Details:
> Now that I'm out of my design phase and finished testing on VMs, I'm ready
> to drop $100k on a pilo. I'd like to get some sense of confidence from the
> community that this is going to work before I pull the trigger.
>
> I'm planning to replace my 110 disk 300TB (usable) Oracle ZFS 7320 with
> CephFS by this time next year (hopefully by December). My workload is a mix
> of small and vary large files (100GB+ in size). We do fMRI analysis on DICOM
> image sets as well as other physio data collected from subjects. We also
> have plenty of spreadsheets, scripts, etc. Currently 90% of our analysis is
> I/O bound and generally sequential.
>
> In deploying Ceph, I am hoping to see more throughput than the 7320 can
> currently provide. I'm also looking to get away from traditional
> file-systems that require forklift upgrades. That's where Ceph really shines
> for us.
>
> I don't have a total file count, but I do know that we have about 500k
> directories.
>
>
> Planned Architecture:
>
> Storage Interconnect:
> Brocade VDX 6940 (40 gig)
>
> Access Switches for clients (servers):
> Brocade VDX 6740 (10 gig)
>
> Access Switches for clients (workstations):
> Brocade ICX 7450
>
> 3x MON:
> 128GB RAM
> 2x 200GB SSD for OS
> 2x 400GB P3700 for LevelDB
> 2x E5-2660v4
> 1x Dual Port 40Gb Ethernet
>
> 2x MDS:
> 128GB RAM
> 2x 200GB SSD for OS
> 2x 400GB P3700 for LevelDB (is this necessary?)
> 2x E5-2660v4
> 1x Dual Port 40Gb Ethernet

The MDS doesn't use any local storage, other than for storing its
ceph.conf and keyring.

>
> 8x OSD:
> 128GB RAM
> 2x 200GB SSD for OS
> 2x 400GB P3700 for Journals
> 24x 6TB Enterprise SATA
> 2x E5-2660v4
> 1x Dual Port 40Gb Ethernet

I don't know what kind of throughput you're currently seeing on your
ZFS system. Unfortunately most of the big CephFS users are pretty
quiet on the lists :( although they sometimes come out to play at
events like https://www.msi.umn.edu/sc15Ceph. :)

You'll definitely want to do some tuning. Right now we default to 100k
inodes in the metadata cache for instance, which fits in <1GB of RAM.
You'll want to bump that way, way up. Also keep in mind that CephFS'
performance characteristics are just weirdly different to NAS boxes or
ZFS in ways you might not be ready for. So large streaming writes will
do great, but if you have shared RW files or directories, that might
be much faster in some places and much slower in ones you didn't think
about. Large streaming reads and writes will go as quickly as RADOS
can drive them (80-100MB/s per OSD for reads is generally a good
estimate, I think? And divide that by replication factor for writes);
with smaller ops you start running into latency issues and the fact
that CephFS (since it's sending RADOS writes to separate objects)
can't coalesce writes as much as local FSes (or boxes built on them).
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux