> Eventually. Right now Ceph is pretty config-heavy, unfortunately. So > like I said, you can change the weights of the slow nodes — this will > map less of the data to them, so they have fewer writes and reads. But > even then you're going to be stuck pretty low due to your disk write > bandwidth. On a modern disk that doesn't matter so much since they can > push two simultaneous 50MB/s+ streams (ie, one 50MB/s journal and one > 50MB/s data store), but even so we generally recommend separate > spindles for the journal. Yeah. I have tried separating the journaling with the real data writing on different disks. And also I used btrfs over multiple disks. That did have a good performance. Greg, thanks a lot for your help. Enjoy the Thanksgiving!! Best, Xiaofei On Thu, Nov 24, 2011 at 1:03 PM, Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> wrote: > On Thu, Nov 24, 2011 at 12:31 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx> wrote: >> This means no matter how many clients I have I should always get >> around 79MB/s, right? This sounds reasonable. Thanks for the >> explanation. So do you guys have plans to solve this "unbalanced >> cluster" problem? > Eventually. Right now Ceph is pretty config-heavy, unfortunately. So > like I said, you can change the weights of the slow nodes — this will > map less of the data to them, so they have fewer writes and reads. But > even then you're going to be stuck pretty low due to your disk write > bandwidth. On a modern disk that doesn't matter so much since they can > push two simultaneous 50MB/s+ streams (ie, one 50MB/s journal and one > 50MB/s data store), but even so we generally recommend separate > spindles for the journal. > >> I guess several other distributed file systems have >> the same issue. HDFS has this issue too. I guess the solution is to >> use stable disk IO bandwidth hardware. If that couldn't be guaranteed, >> you need to detect slow nodes and kick them out of the cluster. > I'm not sure how other systems handle it — many don't, HDFS might or > might not. But as I look at the description of how HDFS replicates > data it looks to me like it matters less. > http://hadoop.apache.org/common/docs/current/hdfs_design.html#Robustness > indicates that all data is written to a client-local file, then > (asynchronously from the client's write) it is written out to the > first DataNode, which copies the data to the second, which copies to > the third, etc. But in this scheme the file doesn't need to be fully > replicated to each DataNode for the client to close the file and > consider it data-safe. > These are the appropriate set of data consistency choices for HDFS, > but Ceph is designed to satisfy consistency requirements for a much > more stringent set of needs. :) > >> >>> Are you using ceph-fuse or the kernel client? And if it's the kernel >>> client, what version? >> I am using ceph-fuse > Bah humbug. :( I've created http://tracker.newdream.net/issues/1752 to > keep track of this issue; it'll be properly prioritized next week. > -Greg > -- Xiaofei (Gregory) Du Department of Computer Science University of California, Santa Barbara -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html