On 09/25/2011 03:56 AM, Di Pe wrote: > So far the discussion has been focusing on XFS vs ZFS. I admit that I > am a fan of ZFS and I have only used XFS for performance reasons on > mysql servers where it did well. When I read something like this > http://oss.sgi.com/archives/xfs/2011-08/msg00320.html that makes me > not want to use XFS for big data. You can assume that this is a real This is a corner case bug, and one we are hoping we can get more data to the XFS team for. They asked for specific information that we couldn't provide (as we had to fix the problem). Note: other file systems which allow for sparse files *may* have similar issues. We haven't tried yet. The issues with ZFS on Linux have to do with legal hazards. Neither Oracle, nor those who claim ZFS violates their patents, would be happy to see license violations, or further deployment of ZFS on Linux. I know the national labs in the US are happily doing the integration from source. But I don't think Oracle and the patent holders would sit idly by while others do this. So you'd need to use a ZFS based system such as Solaris 11 express to be able to use it without hassle. BSD and Illumos may work without issue as well, and should be somewhat better on the legal front than Linux + ZFS. I am obviously not a lawyer, and you should consult one before you proceed down this route. > recent bug because Joe is a smart guy who knows exactly what he is > doing. Joe and the Gluster guys are vendors who can work around these > issues and provide support. If XFS is the choice, may be you should > hire them for this gig. > > ZFS typically does not have these FS repair issues in the first place. > The motivation of Lawrence Livermore for porting ZFS to Linux was > quite clear: > > http://zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf > > OK, they have 50PB and we are talking about much smaller deployments. > However some of the limitations they report I can confirm. Also, > recovering from a drive failure with this whole LVM/Linux Raid stuff > is unpredictable. Hot swapping does not always work and if you > prioritize the re-sync of data to the new drive you can strangle the > entire box (by default the priority of the re-sync process is low on > linux). If you are a Linux expert you can handle this kind of stuff > (or hire someone) but if you ever want to give this setup to a Storage > Administrator you better give them something that they can use with > confidence (may be less of an issue in the cloud). > Compare to this to ZFS: re-silvering works with a very predictable > result and timing. There is a ton of info out there on this topic. I > think that gluster users may be getting around many of the linux raid > issues by simply taking the entire node down (which is ok in mirrored > node settings) or by using hardware raid controllers. (which are often > not available in the cloud ) There are definite advantages to better technology. But the issue in this case is the legal baggage that goes along with them. BTRFS may, eventually, be a better choice. The national labs can do this with something of an immunity to prosecution for license violation, by claiming the work is part of a research project, and won't actively be used in a way that would harm Oracle's interests. And it would be ... bad ... for Oracle (and others) to sue to government over a relatively trivial violation. Until Oracle comes out with an absolute declaration that its OK to use ZFS with Linux in a commercial setting ... yeah ... most vendors are gonna stay away from that scenario. > Some in the Linux community seem to be slightly opposed to ZFS (I > assume because of the licensing issue) and make sometimes odd > suggestions ("You should use BTRFS"). Licensing mainly. BTRFS has a better design, but its not ready yet. Won't be for a while. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615