On 12/28/2012 08:54 AM, William Muriithi wrote: > Joe, >> I have 3 servers with replica 3 volumes, 4 bricks per server on lvm >> partitions that are placed on each of 4 hard drives, 15 volumes >> resulting in 60 bricks per server. One of my servers is also a kvm host >> running (only) 24 vms. >> > Mind explaining your setup again. I kind of could not follow, > probably because of terminology issues. For example > > 4 bricks per server - Don't understand this part, I assumes a brick > == 1 physical server (Okay, could also be one vm, but don't see how > that would be help unless its a test environment). The way you put it > though, mean I have issues with my terminology. > > Isn't there a 1:1 relationship between brick and server? In my configuration, 1 server has 4 drives (well, 5, but one's the OS). Each drive has one gpt partition. I create an lvm volume group that holds all four huge partitions. For any one GlusterFS volume I create 4 lvm logical volumes: lvcreate -n a_vmimages clustervg /dev/sda1 lvcreate -n b_vmimages clustervg /dev/sdb1 lvcreate -n c_vmimages clustervg /dev/sdc1 lvcreate -n d_vmimages clustervg /dev/sdd1 then format them xfs and (I) mount them under /data/glusterfs/vmimages/{a,b,c,d}. These four lvm partitions are bricks for the new GlusterFS volume. As glusterbot would say if asked for the glossary: > A "server" hosts "bricks" (ie. server1:/foo) which belong to a > "volume" which is accessed from a "client". My volume would then look like gluster volume create replica 3 server{1,2,3}:/data/glusterfs/vmimages/a/brick server{1,2,3}:/data/glusterfs/vmimages/b/brick server{1,2,3}:/data/glusterfs/vmimages/c/brick server{1,2,3}:/data/glusterfs/vmimages/d/brick >> Each vm image is only 6 gig, enough for the operating system and >> applications and is hosted on one volume. The data for each application >> is hosted on its own GlusterFS volume. > Hmm, petty good idea, especially security wise. Means one VM can not > mess with another vm files. Is it possible to extend gluster volume > without destroying and recreating it with bigger peer storage setting I can do that two ways. I can add servers with storage and then add-brick to expand, or I can resize the lvm partitions and grow xfs (which I have done live several times). >> For mysql, I set up my innodb store to use 4 files (I don't do 1 file >> per table), each file distributes to each of the 4 replica subvolumes. >> This balances the load pretty nicely. > I thought lots of small files would be better than 4 huge files? I > mean, why does this work out better performance wise? Not saying its > wrong, I am just trying to learn from you as I am looking for a > similar setup. However, I could not think why using 4 files would be > better but this may because I don't understand how glusterfs works may > be It's not so much a "how glusterfs works" question as much as it is a how innodb works question. By configuring the innodb_data_file_path to start with a multiple of your bricks (and carefully choosing some filenames to ensure they're distributed evenly), records seem to be (and I only have tested this through actual use and have no idea if this is how it's supposed to work) accessed evenly over the distribute set. With a one file per table model, all records read from any specific table will be read from only one distribute subvolume. At least with my data set, that would hit one distribute subvolume really heavily while leaving the rest fairly idle. >> I don't really do anything special for anything else, other than the php >> app recommendations I make on my blog (http://joejulian.name) which all >> have nothing to do with the actual filesystem. >> > Thanks for the link >> The thing that I think some people (even John Mark) miss apply is that >> this is just a tool. You have to engineer a solution using the tools you >> have available. If you feel the positives that GlusterFS provides >> outweigh the negatives, then you will simply have to engineer a solution >> that suits your end goal using this tool. It's not a question of whether >> it works, it's whether you can make it work for your use case. >> >> On 12/27/2012 03:00 PM, Miles Fidelman wrote: >>> Ok... now that's diametrically the opposite response from Dan Cyr's of >>> a few minutes ago. >