Re: some performance issue

sheng qiu <herbert1984106@xxxxxxxxx> · Mon, 4 Feb 2013 09:36:48 -0600

Hi Mark,

thanks a lot for your reply.

On Fri, Feb 1, 2013 at 3:10 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
> On 02/01/2013 02:20 PM, sheng qiu wrote:
>>
>> Hi,
>>
>> i did one experiment which gives some interesting result.
>>
>> i create two OSD (ext4), each is a SSD attached on the same PC. i also
>> configure one monitor and one mds on that PC.
>> so generally, my OSDs, monitor and mds locate on the same node.
>>
>> i set up the ceph service and mount the ceph also on a local directory
>> on that PC. so client, OSDs, monitor and mds all on the same node.
>> i suppose this will exclude the network communication cost.
>>
>> i run fio benchmark which create one 10GB file (larger than main
>> memory) on the ceph mount point. it perform sequential read/write and
>> random read/write on the file, and generate the throughput result.
>>
>> next i umount the ceph and stop ceph service. i create ext4 on the
>> same SSD that used as OSD before. then run the same workloads and get
>> the throughput result.
>>
>> here are the results:
>>
>> (throughput kb/s)Seq-read       Rand-read       Seq-write       Rand-write
>> ceph                     7378   4740               790  1211
>> ext4                     58260  17334    54697  34257
>>
>> as you see, the ceph has huge performance down, even monitor, mds,
>> client and osds locate on the same physical machine.
>> another interesting thing is the seq-write has lower throughput
>> compared with random-write under ceph. not quite clear....
>>
>> does anyone have idea about why ceph has that performance down?
>
>
> Hi Sheng,
>
> Are you using RBD or CephFS (and kernel or userland clients?)  How much
> replication?  Also, what FIO settings?
>

   i am using CephFS and kernel clients. the replication is by default
(3?). the FIO is using the ssd-test script, IO request size is 4kb.

> In general, it is difficult to make distributed storage systems perform as
> well as local storage for small read/write workloads.  You need a lot of
> concurrency to hide the latencies, and if the local storage is incredibly
> fast (like an SSD!) you have a huge uphill battle.
>
> Regarding the network, Even though you ran everything on localhost, ceph is
> still using TCP sockets to do all of the communication.
>

   i guess when it checked the remote ip is actually the local
address, it will directly patch the send packets to the receive
buffer. right?

> Having said that, I think we can do better than 790 IOPs for seq writes,
> even if it's 2x replication.  The trick is to find where in the stack things
> are getting held up.  You might want to look at tools like iostat and
> collectl, and look at some of the op latency data in the ceph admin socket.
> A basic introduction is described in sebastian's article here:
>
> http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/
>
>>
>> Thanks,
>> Sheng
>>
>>
>

I would try your suggestion to find where the bottleneck is.
the reason i did this experiment is just trying to find some potential
issues with ceph. i am a Ph.d. student and trying to do some research
work on it.
i would be happy to hear your suggestions.

Thanks,
Sheng

-- 
Sheng Qiu
Texas A & M University
Room 332B Wisenbaker
email: herbert1984106@xxxxxxxxx
College Station, TX 77843-3259
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html