Re: poor OSD performance using kernel 3.4

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Mon, 28 May 2012 08:52:03 +0200 (CEST)

> I think filestore journal parallel works only with btrfs. 
> Other filesystem are writeahead. 
>>... you might be right but i can't change ceph's implementation. 

See my schema,
I think you see parallel writes, because you see flush write of first wave to disk, in the same time 
of second wave write to journal.

>>I totally aggree with you but this is just a test setup AND if you have 
>>a big log file to copy let's say 100GB your journal will never be big 
>>enough and the speed should never drop to 0MB/s. Also i see the correct 
>>behaviour with 3.0.X where the speed is maxed to the underlying device. 
>>So i still see no reason that with 3.4 the speed drops to 0MB/s and is 
>>mostly 10-20MB/s instead of 130MB/s. 

Maybe something is wrong with 3.4, then your disk write more slowly. (xfs bug, sata driver controller bug, ...)
on my schema:
Enough slowly to have the third wave to block on the journal. (so 0MB/S)

maybe some local benchmark of your ssd with 3.4 can give some tips ?

>> How many disks (7,2K) do you have by osd ? 
>>>One intel 520 SSD per OSD. 

I see some benchmark on internet about 150-300MB/s (depend of the blocksize).

Something must be wrong, Doing local benchmark can really help I think.
You can use sysbench-tools
https://github.com/tsuna/sysbench-tools
It make bench compare with nice graphs.

----- Mail original ----- 

De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx> 
À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
Cc: ceph-devel@xxxxxxxxxxxxxxx, "Mark Nelson" <mark.nelson@xxxxxxxxxxx> 
Envoyé: Lundi 28 Mai 2012 08:25:24 
Objet: Re: poor OSD performance using kernel 3.4 

Am 28.05.2012 07:37, schrieb Alexandre DERUMIER: 
> I think filestore journal parallel works only with btrfs. 
> Other filesystem are writeahead. 
... you might be right but i can't change ceph's implementation. 

> if you write at 120MB/S, so your journal of 1GB is at 50% in 4sec. 
> 
> So you got around 480MB each 4sec, does your disks can flush sequentially these 480MB in less than 4sec ? 
> (do a small benchmark of your disk in local filesystem, without ceph) 
> 
> If not, you can have spikes in your write stats if the journal. 
> 
> simple schema if disks are not fast enough: 
I totally aggree with you but this is just a test setup AND if you have 
a big log file to copy let's say 100GB your journal will never be big 
enough and the speed should never drop to 0MB/s. Also i see the correct 
behaviour with 3.0.X where the speed is maxed to the underlying device. 
So i still see no reason that with 3.4 the speed drops to 0MB/s and is 
mostly 10-20MB/s instead of 130MB/s. 

> How many disks (7,2K) do you have by osd ? 
One intel 520 SSD per OSD. 

Stefan 

-- 

-- 

	Alexandre D erumier 
Ingénieur Système 
Fixe : 03 20 68 88 90 
Fax : 03 20 68 90 81 
45 Bvd du Général Leclerc 59100 Roubaix - France 
12 rue Marivaux 75002 Paris - France 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html