File I/O with mixed read/write and high streaming performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm trying to understand if the CephFS is a good approach for the following scenario.  From some of the OLD benchmarks, GlusterFS significantly beat the CephFS when many file I/Os were required.   But... this was an OLD benchmark.   I'd like your thoughts on the matter.

What I need to perform are  the following two steps:

Re-organize files  (Step one)

I need to take a large directory structure (assumed to reside on CephFS) and "re-arrange" it via a copy or link mechanism.  I will want to make a full copy of the directory structure but do a simple disk span chunking so that all the files in the original copy end up in a set of folders where each folder is no larger than a fixed size.  This is like what we did back in the days where we needed to write data in CDROM sized chunks.  There is a set of tools that will do this in the genisoimage package (dssplit and dirsplit).  Folder Axe was the MS Windows equivalent

Presumably, this would put a large random read and random write load on the cluster.  Since the size can be large (hundreds of G (maybe up to 1TB) with 10s to 100s of thousands of small files), I would need for this to be well optimized.  One mechanism that might be available is to use hard or soft links so that no actual copying is done (Don't know if CephFS/POSIX supports this).   The linking approach would probably put a large strain on the MDS servers but not so much on the storage.

Write to media (Step two)

I need to stream the chunked folders to a set of media devices (think tape drive) that can ingest at high speed (about 200 megabytes per second... yes bytes).  I'd like to make sure that we can feed the ingest at the max rate (if possible).     Whether we can write the folder chunks one at a time or in parallel (to multiple tape drives) remains to be seen.  Presumably, this would put a large random read load on the cluster.  Once the media has been successfully written, the chunked copy can be deleted.

Notes:

Currently, planning for all access to be done via Linux servers.  I'm eagerly watching the windows native CephFS beta.
The server performing the chunking job will be the only reader/writer of the data.
The server performing the streaming job will also be the only reader/writer of the data. 
If we can support parallel, then there may be 2-3 chunking servers and 2-3 streaming servers operating concurrently.
There are only a few system in play... NOT hundreds of concurrent clients accessing the data.
One might assume that we could keep the raw data on cheaper disk and then "reconstruct" the copy on flash.  In this scenario, we can stream from flash.

I'd definitely appreciate your feedback on whether CephFS would be a good fit.

Thanks in advance for your thoughts!

- Steve
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux