RE: some performance issue

"Chen, Xiaoxi" <xiaoxi.chen@xxxxxxxxx> · Mon, 4 Feb 2013 16:52:28 +0000

I doubt your data is correct ,even the ext4 data, have you use O_DIRECT when doing the test? It's unusual to have 2X random write IOPS than random read.

CephFS kernel client seems not stable enough, think twice before you use it.
>From your previous mail I guess you would like to do some caching or dynamic tiring ,introducing ssd into DFS for better performance. There are a lot of layer you can do such kind of caching or migration, you can cache on client side , or do as sage said ,having a disk pool and a ssd pool then migrate data between them, or you can cache inside OSD. 
We are also interested in similar research. But it's still WIP. 

-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of sheng qiu
Sent: 2013年2月4日 23:37
To: Mark Nelson
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: some performance issue

Hi Mark,

thanks a lot for your reply.

On Fri, Feb 1, 2013 at 3:10 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
> On 02/01/2013 02:20 PM, sheng qiu wrote:
>>
>> Hi,
>>
>> i did one experiment which gives some interesting result.
>>
>> i create two OSD (ext4), each is a SSD attached on the same PC. i 
>> also configure one monitor and one mds on that PC.
>> so generally, my OSDs, monitor and mds locate on the same node.
>>
>> i set up the ceph service and mount the ceph also on a local 
>> directory on that PC. so client, OSDs, monitor and mds all on the same node.
>> i suppose this will exclude the network communication cost.
>>
>> i run fio benchmark which create one 10GB file (larger than main
>> memory) on the ceph mount point. it perform sequential read/write and 
>> random read/write on the file, and generate the throughput result.
>>
>> next i umount the ceph and stop ceph service. i create ext4 on the 
>> same SSD that used as OSD before. then run the same workloads and get 
>> the throughput result.
>>
>> here are the results:
>>
>> (throughput kb/s)Seq-read       Rand-read       Seq-write       Rand-write
>> ceph                     7378   4740               790  1211
>> ext4                     58260  17334    54697  34257
>>
>> as you see, the ceph has huge performance down, even monitor, mds, 
>> client and osds locate on the same physical machine.
>> another interesting thing is the seq-write has lower throughput 
>> compared with random-write under ceph. not quite clear....
>>
>> does anyone have idea about why ceph has that performance down?
>
>
> Hi Sheng,
>
> Are you using RBD or CephFS (and kernel or userland clients?)  How 
> much replication?  Also, what FIO settings?
>

   i am using CephFS and kernel clients. the replication is by default (3?). the FIO is using the ssd-test script, IO request size is 4kb.

> In general, it is difficult to make distributed storage systems 
> perform as well as local storage for small read/write workloads.  You 
> need a lot of concurrency to hide the latencies, and if the local 
> storage is incredibly fast (like an SSD!) you have a huge uphill battle.
>
> Regarding the network, Even though you ran everything on localhost, 
> ceph is still using TCP sockets to do all of the communication.
>

   i guess when it checked the remote ip is actually the local address, it will directly patch the send packets to the receive buffer. right?

> Having said that, I think we can do better than 790 IOPs for seq 
> writes, even if it's 2x replication.  The trick is to find where in 
> the stack things are getting held up.  You might want to look at tools 
> like iostat and collectl, and look at some of the op latency data in the ceph admin socket.
> A basic introduction is described in sebastian's article here:
>
> http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/
>
>>
>> Thanks,
>> Sheng
>>
>>
>

I would try your suggestion to find where the bottleneck is.
the reason i did this experiment is just trying to find some potential issues with ceph. i am a Ph.d. student and trying to do some research work on it.
i would be happy to hear your suggestions.

Thanks,
Sheng

--
Sheng Qiu
Texas A & M University
Room 332B Wisenbaker
email: herbert1984106@xxxxxxxxx
College Station, TX 77843-3259
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f