On 28 October 2013 23:13, Tim van Elteren <timvanelteren at gmail.com> wrote: Background: VFX and Post Production studio that relies on Gluster for all of it's production data storage. Data is mostly in the shape of image sequences (raw 2K and 4K footage stored in individual frames, clocking in at 5-100MB per frame, to be played back at 24 or 48 FPS). We create content from a render farm consisting of many Linux boxes spewing out these images as fast as they can. > 1) transparency to the processing layer / application with respect to data > locality, e.g. know where data is physically located on a server level, > mainly for resource allocation and fast processing, high performance, how > can this be accomplished using GlusterFS? We mount local GlusterFS clusters (we have them at each site) under the same mount point. We simply rsync chunks of the tree between sites as required on a job by job basis (we have "pipeline tools" we write to make this easier for end users, so they call a wrapper script, and not rsync directly). There are remote mount points for users to mount a remote site over our WAN and browse it without needing to rsync big chunks of data if they prefer. This isn't GlusterFS's problem, per se, but just a regular remote file system problem. With that said, GlusterFS's geo-replication may assist you if you want to automate it more. > 2) posix compliance, or conformance: hadoop for example isn't posix > compliant by design, what are the pro's and con's? What is GlusterFSs > approach with respect to support for posix operations? We run a setup of 70% Linux, 20% MacOSX and 10% Windows, all of which require regular file semantics (not object storage). The 90% of our client base that requires a POSIX compliant file system has no problems with it. We re-export all shares via Samba3 for Windows clients, but we are in the process of totally removing Windows from our setup for a variety of reasons (mostly because it isn't POSIX complaint, and we need to support a whole new set of tools to deal with it, and it sucks for our industry). Anything we need to do on a regular EXT3/EXT4 file system, we can do on GlusterFS (symlinks, hard links, tools like "rsnapshot", etc). Users don't see any difference between it and any regular NFS-mounted NAS type device. > 3) mainly with respect to evaluating the production readiness of GlusterFS, > where is it currently used in production environments and for what specific > usecases it seems most suitable? Are there any known issues / common > pitfalls and workarounds available? We've been running 3.3.1 since March this year, and I'm upgrading to 3.4.1 over the coming weeks. I've had outages, but it's never been Gluster's fault. I initially put AFP for our Macs on our Gluster nodes, which was a mistake, and has since been removed (MacOSX Finder is too slow over SMB due to constant resource fork negative lookups, and we've finally figured out how to get Macs to NFS mount from Gluster without locking up Finder anyway so we're migrating away from AFP). Likewise we've had some hardware and network switch failures that have taken out multiple nodes. Recovery was quick enough though (I think 30 minutes was the worst outage, but again not Gluster's fault). Because we use a distribute+replicate setup, our single-threaded write speeds aren't amazing. But Gluster's power is in the cluster itself, not any one write thread, so over the course of millions of rendered frames, the performance is better than any single NAS device could give us under the same client load. Some applications do silly things - we have a handful of apps where the "Open File" dialog insists on doing background reads of whatever tree it's browsing, making file browsing excruciatingly slow. But again, these are rare and typically only on Windows, so we'll be leaving that behind soon. The other drama we had was a cpio write operation from one of our production apps was very slow to GlusterFS (GlusterFS doesn't seem to like anything that requests a portion of a file only), so we wrote a wrapper script to save to a local tmpfs, and then copy that back to GlusterFS. That was only for one operation out of thousands though, and was easy enough to solve (and it gave us the ability to extend version control into the wrapper script, which we'd wanted to do anyway). > 4) It seems Gluster has the advantage in Geo replication versus for example > Ceph. What are the main advantages here? We don't use this, so I can't comment. If I required my remote and local trees to be in sync, I'd definitely use it. But end users here are happy to transfer data only when required (because even with 800TB online, space is still precious). > 5) Finally what would be the most compelling reason to go for Gluster and > not for the alternatives? For us, we needed simple file semantics (we specifically don't need object storage for OpenStack or Hadoop type operations). That gave us 4 options: 1) Continue with our legacy setup of many NAS units. Pros: cheap. Cons: inflexible to share storage space between departments, single point of failure per share 2) Buy a NAS or SAN from a vendor. Pros: simple, easily expandible. Cons: proprietary, expensive, vendor/product lock-in for future upgrades 3) Proprietary clustered file system (IBM GPFS, etc). Same pros/cons as a SAN, quite frankly. 4) Ceph or Lustre. Pros: open source, usual clustered storage benefits, central name space, etc Cons: in-kernel, required many nodes for a good high-availability setup to start with, needs a few smart people around to keep it running 5) GlusterFS: Pros: open source, low node count for basic rollout (cheaper to start), usual clustered storage benefits, MUCH simpler than Ceph / Lustre. Cons: usual clustered storage cons (single threaded write speed, etc), young-ish technology GlusterFS won the day due to simplicity and the ability to start small. it keeps up with our business needs, and lets us pick and choose our hardware (including mixing and matching different specs/vendors of hardware within the one cluster into the future). Additionally, RedHat's backing of Gluster finally cemented for us that it was worthwhile, and it wasn't going to be a project that just vanished overnight or suffered from production-breaking changes between releases. As suggested, you should test for yourself, and not take anyone's word on anything. But for us, they were the decisions we made, and why. HTH. -- Dan Mons R&D SysAdmin Unbreaker of broken things Cutting Edge http://cuttingedge.com.au