In addition I like to point out that:
- Glusterfs can leave your files in a whole piece. Call me old fashion, but for "normal" workload I like that - ceph also needs an underlying filesystem too, so you will be left with fsck questions etc. too - cephfs is, as of now, still not production ready and following the news this isn't the main focus right now. So you either have to use rbd's (and what filesystem inside?) or you make you applications using RADOSGW. So for my actual scenario (small amount of server, n sided replication as VM-image storage backend) glusterfs seems the better approach to me (less administration effort, keep my files in one piece) hth
Bernhard
On 12/28/2013 08:06 PM, Knut Moe wrote:
Are there benefits in configuration, scaling, management etc?
Hi, One year ago I was also in the situation where I had to compare these 3 cluster file systems. I'm not the greatest expert on this topic, but here's a rough summary of what I know and why we chose GlusterFS over Hadoop or Ceph.
There are great differences between Hadoop/Ceph and GlusterFS. First of all being that GlusterFS distributes files and works on top of existing filesystems. It is completely transparent to the system and applications. This adds overhead (at costs of speed), but increases your options in case of errors and long-term preservation.
Most other storage cluster systems on the other hand, split each file into a certain number of blocks and distribute those.
Hadoop is not just some filesystem you can mount (as I understood from reading Apache's docs): You have to talk to it using the Hadoop API. As I understood it, you'd have to write your applications to use Hadoop, instead of just having it as transparent filesystem beneath. Please correct me if I'm wrong.
This makes a great difference regarding effort and use-cases.
What's also special about GlusterFS is that it's really (really) easy and quick to set up a basic constellation, and it ships out-of-the-box with most distro repositories. There are things you can tweak, but you don't have to. It's understandable very quickly by any regular Linux admin. Just yesterday, I've setup gluster on my raspberry pi home fileserver :)
There are many many more details about how Hadoop and Ceph "tick", compared to GlusterFS (e.g. "NameNode"), but I think there are others here who can explain that way better than me.
I know Hadoop is used by the likes of Yahoo and Facebook. I would be interested in information on any large (known) users of GlusterFS.
When I was looking for systems for large storage clusters for long-term media archiving, I initially thought I'd go for Hadoop, because "the big ones are using it". I also listened to a presentation about Ceph, so I could compare them.
In the end, we chose GlusterFS, because for digital archiving, we needed a scalable storage cluster with the highest chance of maintaining it over the years, rather than minimizing our downtime to seconds. We are currently building up a storage for the national A/V archive with 2x >300 TiB.
I've already done initial tests with GlusterFS on one node, but it's too soon to really speak of "experience" on our side. We've also only used gluster in "distribute" mode, so I have no experience with gluster-replication at all.
Regards, Peter B.
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users
|