Hi Mark, thanks a lot for your reply. On Fri, Feb 1, 2013 at 3:10 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote: > On 02/01/2013 02:20 PM, sheng qiu wrote: >> >> Hi, >> >> i did one experiment which gives some interesting result. >> >> i create two OSD (ext4), each is a SSD attached on the same PC. i also >> configure one monitor and one mds on that PC. >> so generally, my OSDs, monitor and mds locate on the same node. >> >> i set up the ceph service and mount the ceph also on a local directory >> on that PC. so client, OSDs, monitor and mds all on the same node. >> i suppose this will exclude the network communication cost. >> >> i run fio benchmark which create one 10GB file (larger than main >> memory) on the ceph mount point. it perform sequential read/write and >> random read/write on the file, and generate the throughput result. >> >> next i umount the ceph and stop ceph service. i create ext4 on the >> same SSD that used as OSD before. then run the same workloads and get >> the throughput result. >> >> here are the results: >> >> (throughput kb/s)Seq-read Rand-read Seq-write Rand-write >> ceph 7378 4740 790 1211 >> ext4 58260 17334 54697 34257 >> >> as you see, the ceph has huge performance down, even monitor, mds, >> client and osds locate on the same physical machine. >> another interesting thing is the seq-write has lower throughput >> compared with random-write under ceph. not quite clear.... >> >> does anyone have idea about why ceph has that performance down? > > > Hi Sheng, > > Are you using RBD or CephFS (and kernel or userland clients?) How much > replication? Also, what FIO settings? > i am using CephFS and kernel clients. the replication is by default (3?). the FIO is using the ssd-test script, IO request size is 4kb. > In general, it is difficult to make distributed storage systems perform as > well as local storage for small read/write workloads. You need a lot of > concurrency to hide the latencies, and if the local storage is incredibly > fast (like an SSD!) you have a huge uphill battle. > > Regarding the network, Even though you ran everything on localhost, ceph is > still using TCP sockets to do all of the communication. > i guess when it checked the remote ip is actually the local address, it will directly patch the send packets to the receive buffer. right? > Having said that, I think we can do better than 790 IOPs for seq writes, > even if it's 2x replication. The trick is to find where in the stack things > are getting held up. You might want to look at tools like iostat and > collectl, and look at some of the op latency data in the ceph admin socket. > A basic introduction is described in sebastian's article here: > > http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/ > >> >> Thanks, >> Sheng >> >> > I would try your suggestion to find where the bottleneck is. the reason i did this experiment is just trying to find some potential issues with ceph. i am a Ph.d. student and trying to do some research work on it. i would be happy to hear your suggestions. Thanks, Sheng -- Sheng Qiu Texas A & M University Room 332B Wisenbaker email: herbert1984106@xxxxxxxxx College Station, TX 77843-3259 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html