At 04:00 AM 12/3/2008, Stas Oskin wrote: >Hi. > >Thanks for your detailed answers. I'd like to clarify several points: > >2008/12/3 Keith Freedman ><<mailto:freedman at freeformit.com>freedman at freeformit.com> >I'm not sure there's an official recommendation. >I use XFS with much success. > > >Is XFS suitable for massive writing / occasional reading? XFS is more optimal than EXT3 or ReiserFS for write environments: some useful information is here: http://www.ibm.com/developerworks/library/l-fs9.html I'd pay close attention to the "Delayed allocation" section. >I think the choice of underlying filesystem depends highly on the >types of data you'll be storing and how you'll be storing the info. >If it's primarily read data, then a filesystem with journaling >capabilities may not provide much benefit. If you'll have lots of >files in few directories then a filesystem with better large >directory metrix would be ideal, etc... Gluster depends on the >underlying filesystem, and will work no matter what that filesystem >is provided it supports extended attributes. > > >I'm going to store mostly large files (100+ MB), with massive >writing, and only occasional read operations. > > >I've found XFS works great for most purposes. If you're on Solaris, >I'd recommend ZFS. but It seems people are fond of ReiserFS, but >you could certainly use EXT3 with extended attributes enabled and be >just fine most likely. > > >I'm actually prefer to stay on Linux. How well XFS compares to EXT3 >in the environment that I described? They're all Linux filesystems, so that's not the issue. >as for LVM. again, this really depends what you want to do with the data. >If you need to use multiple physical devices/partitions to present >just one to gluster you can do that and use LVM to manage your >resizing of the single logical volume. > >This was the first idea I though about, as I'm going to use 4 disks >per server. > >Alternatively, you could use gluster's Unify translator to present >one effective large/consolidated volume which can be made up of >multiple devices/partitions. > > >I think I read somewhere in this mailing list that there is a >migration from Unity to DHT in GlusterFS (whichever it means) in the >coming 1.4. If Unity is the legacy approach, what is the relevant >solution for 1.4 (DHT)? the approach is the same. I belive the concept is that there's a translator that groups multiple smaller smaller filesystem pieces into a single representation. Gluster lets you do this through the filesystem where LVM lets you do this through the block devices. Personally, I'd go with LVM since it's likely easier to manage in the long run and gives you more flexibility. You can grow your LVM volumen, and you can, if you go with XFS, dynamically resize your filesystem, and you wont have to make any changes to your gluster config. > >In this scenario, you could potentially have multiple underlying >configurations. You could Unify xfs, reiser, and ext3 filesystems >into one gluster filesystem. > >as for RAID, again, the faster and more appropriately configured the >underlying system for your data requirements, the better off you >will be. If you're going to use gluster's AFR translator, then I'd >not bother with hardware raid/mirroring and just use RAID0 stripes, >however, if you have the money, and can afford to do RAID0+1, that's >always a huge benefit on read performance. Of course, if you're in >a high write environment, then there's no real added value so it's >not worth doing. > > >Couple of points here: >1) Thanks to AFR, I actually don't need any fault-tolerant raid >(like mirror), so it's only recommeded in high-volume read >enviroments, which is not the case here. Is this correct? you can use AFR as your fault-tolerance/mirror. However, be aware that this means your "mirroring" wil be going at network speed. If you have no need to have multiple servers with live replicated data, you'll be much better off performance wise using hardware mirroring. However, if you want/need to have multiple servers serving identical data, then just use AFR and then you can live without hardware mirroring. I'm not sure how gluster/AFR will perform with a very large file high-write environment. We'll have to see what the gluster devs say about it, but what I can say is this: In the event your AFR servers loose contact and then later have to auto-heal, gluster will have to move the entire large file, since it doesn't, as far as I know, have rsync like capabilities wherein it would only move the modified bits of the file over the network--I believe it just copies over the whole thing, so if this happens a lot, it will bog things down significantly. >2) Isn't LVM (or GlusterFS own solution) much better then RAID 0 in >sense that if one of the disks go, the volume still continues to >work? This contrary to RAID where the whole volume goes down? you're confused about what RAID means. yes, RAID0 (striping), there is no redundancy. RAID 1 (mirroring) provides redundancy and if one drive fails the volume still functions -- you can do this with hardware or, I believe, with LVM. Then there's RAID0+1 (Striping & mirroring) which provides the performance benefit of striping with the high-availability of mirroring. So whether or not you use LVM to do your raid or a hardware raid controller doesn't change anything. RAID 0 you have a volume down in a failure, RAID 1 you can withstand a drive failure. >3) Continuing 2, I think I actually meant JBOD - where you just >connect all the drives and make them look as a single device, rather >then stripping. Right, however this presents the same issues as striping but without the performance benefit of striping. Lets say you have AFR set up and you have a 4 disk stripe or concatenated (jbod) volume on each of 2 servers. if you have a single drive failure on one server, that entire filesystem becomes unavailable. When you repair your drive, you effectively have a blank empty filesystem now. gluster/AFR will notice this and start auto-healing the entire filesystem (as each directory and file are accessed). so in time you'll have copied over the entire filesystem over the network. However, if you have a single server and you mirror your devices in a RAID1/0+1 config, then you loose a drive, you're filesystem is still running, replace the drive and the RAID software fixes everything. AFR is much more efficient in high read environments since you can either distribute the load across multiple servers and specify a local read volume to insure a particular client is always using the fastest server (which could be it's own local brick, or a server on a lan when you're using AFR across a wan). >If you could clarfy the recommended approach, it would be great. so here's a summary: IF you do NOT need to have more than one server serving the data (i.e, you're not going to replicate the data for DR purposes) I'd recommend you avoid AFR in gluster and instead configure RAID0+1 on your server. You'd be better off using a hardware RAID controller with a large batter backed up cache, but you could use a software RAID (like LVM). if you said you had a high read environment, I'd have suggested 2 servers using AFR over a private high-speed network since that reduces your points of failure, but given the high write large file environment, AFR may become a bottleneck. --again, if you NEED server redundancy, then AFR is your best option, but if you don't need it then it will just slow things down. >this doesn't realy answer your question, but hopefully it helps. > > > >Thanks again for your help. > >Regards.