Tommi Virtanen <tommi.virtanen <at> dreamhost.com> writes: > > On Wed, Feb 22, 2012 at 23:12, madhusudhana > <madhusudhana.u.acharya <at> gmail.com> wrote: > > 1. can you please let me know how I can make only 1 MDS active ? > > You can see that in "ceph -s" output, the "mds" line should have just > one entry like "0=a=up:active" with the word active. > > You can control that with the "max mds" config option, and at runtime > with "ceph mds set_max_mds NUM" and "ceph mds stop ID". > > Note, decreasing the number of active MDSes is not currently well > tested. You might be better off with a fresh cluster, that only ever > ran one ceph-mds process. > > > 2. BTRFS for all OSD's > > There is currently one known case where btrfs's internal structured > get fragmented, and its performance starts degrading. You might want > to make sure you start your test with freshly-mkfs'ed btrfses. > > > 3. All hosts (including OSD) in my ceph cluster are running 3.0.9 ver > > [root <at> ceph-node-8 ~]# uname -r > > 3.0.9 > > Well, that's at least in the 3.x series.. Btrfs has had a steady > stream of fixes, so in general we recommend running the latest stable > kernel. You might want to try that. > > > 4. All 9 machines are replica of each other. I have imaged them using > > systemimager. Only difference is 9th node is not a part of CEPH > > cluster. I mounted ceph cluster to this node using mount -t ceph > > command > > That's good. > > > 5. All 9 clients are running same version of CentOS and Kernel with > > 1GigE interface > > > You mean to say, I can have ceph mon/OSD's running in the > > same machine ? but, in ceph wiki, i have read that its better to > > have different machines for each mds/mon/osd. > > Yes, I just wanted to make sure you have it set up like that. > > > I assume that ceph uses whatever ethernet interface i have (1GigE) > > in my system to load balance the cluster in case of node failure and > > node addition. Won't this uses entire bandwidth during load > > balancing ? won't this cause bandwidth saturation for clients ? > > Yes. That's why you can set up a separate network for cluster-internal > communication. See "cluster network" or "cluster addr" vs "public > network" or "public addr". > > > I would like to know what benchmark I should use to test CEPH ? > > I want to present the data to my management how CEPH can perform when > > compared with other file systems (like GlusterFS/NetApp/Lustre) > > You should use the benchmark that matches your actual workload best. > > Please stay active on the mailing list until your results start > looking good. The more information you can provide, the better we can > help you. > > We're looking forward to get one of our new hires going, he'll be > benchmarking Ceph on pretty decent hardware & 10gig network with > whatever loads we can come up with. That should give you a better idea > of what to expect, and us what to keep working on. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo <at> vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Thank you Tommi for your response. 1. In my cluster, all OSD's are mkfs'ed with btrfs 2. Below is what i can see with ceph -s output. Is that mean, only one MDS is operation and another one is standby ? mds e5: 1/1/1 up {0=ceph-node-1=up:active}, 1 up:standby 3. I will not be able to use new stable kernel bcz of company policy :-( 4. If you don't mind, can you please give me a bit of insight on cluster network, what it is and how i can configure one for my ceph cluster ? Will there be a significant performance improvement with this ? 5. I have done some testing with dd on ceph. Below are the results CASE 1:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 4.04089 seconds, 1.0 GB/s CASE 2:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k count=10000000 10000000+0 records in 10000000+0 records out 40960000000 bytes (41 GB) copied, 445.786 seconds, 91.9 MB/s CASE 3:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k count=100000000 71414032+0 records in 71414032+0 records out 292511875072 bytes (293 GB) copied, 4116.59 seconds, 71.1 MB/s As you can see from above output, for 4G file of 4k blocks, speed clocked at 1GB/s, it gradually decreased when i increased the file size above 10G. And also, if i run back to back dd with CASE 1 option, the write will slow down from 1GB/s to 90MB/s. can you please explain whether this behaviour is expected ? if yes, why ? if not, how i can achieve 1GB/s for all file sizes ? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html