I doubt your data is correct ,even the ext4 data, have you use O_DIRECT when doing the test? It's unusual to have 2X random write IOPS than random read. CephFS kernel client seems not stable enough, think twice before you use it. >From your previous mail I guess you would like to do some caching or dynamic tiring ,introducing ssd into DFS for better performance. There are a lot of layer you can do such kind of caching or migration, you can cache on client side , or do as sage said ,having a disk pool and a ssd pool then migrate data between them, or you can cache inside OSD. We are also interested in similar research. But it's still WIP. -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of sheng qiu Sent: 2013年2月4日 23:37 To: Mark Nelson Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: Re: some performance issue Hi Mark, thanks a lot for your reply. On Fri, Feb 1, 2013 at 3:10 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote: > On 02/01/2013 02:20 PM, sheng qiu wrote: >> >> Hi, >> >> i did one experiment which gives some interesting result. >> >> i create two OSD (ext4), each is a SSD attached on the same PC. i >> also configure one monitor and one mds on that PC. >> so generally, my OSDs, monitor and mds locate on the same node. >> >> i set up the ceph service and mount the ceph also on a local >> directory on that PC. so client, OSDs, monitor and mds all on the same node. >> i suppose this will exclude the network communication cost. >> >> i run fio benchmark which create one 10GB file (larger than main >> memory) on the ceph mount point. it perform sequential read/write and >> random read/write on the file, and generate the throughput result. >> >> next i umount the ceph and stop ceph service. i create ext4 on the >> same SSD that used as OSD before. then run the same workloads and get >> the throughput result. >> >> here are the results: >> >> (throughput kb/s)Seq-read Rand-read Seq-write Rand-write >> ceph 7378 4740 790 1211 >> ext4 58260 17334 54697 34257 >> >> as you see, the ceph has huge performance down, even monitor, mds, >> client and osds locate on the same physical machine. >> another interesting thing is the seq-write has lower throughput >> compared with random-write under ceph. not quite clear.... >> >> does anyone have idea about why ceph has that performance down? > > > Hi Sheng, > > Are you using RBD or CephFS (and kernel or userland clients?) How > much replication? Also, what FIO settings? > i am using CephFS and kernel clients. the replication is by default (3?). the FIO is using the ssd-test script, IO request size is 4kb. > In general, it is difficult to make distributed storage systems > perform as well as local storage for small read/write workloads. You > need a lot of concurrency to hide the latencies, and if the local > storage is incredibly fast (like an SSD!) you have a huge uphill battle. > > Regarding the network, Even though you ran everything on localhost, > ceph is still using TCP sockets to do all of the communication. > i guess when it checked the remote ip is actually the local address, it will directly patch the send packets to the receive buffer. right? > Having said that, I think we can do better than 790 IOPs for seq > writes, even if it's 2x replication. The trick is to find where in > the stack things are getting held up. You might want to look at tools > like iostat and collectl, and look at some of the op latency data in the ceph admin socket. > A basic introduction is described in sebastian's article here: > > http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/ > >> >> Thanks, >> Sheng >> >> > I would try your suggestion to find where the bottleneck is. the reason i did this experiment is just trying to find some potential issues with ceph. i am a Ph.d. student and trying to do some research work on it. i would be happy to hear your suggestions. Thanks, Sheng -- Sheng Qiu Texas A & M University Room 332B Wisenbaker email: herbert1984106@xxxxxxxxx College Station, TX 77843-3259 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f