> I think filestore journal parallel works only with btrfs. > Other filesystem are writeahead. >>... you might be right but i can't change ceph's implementation. See my schema, I think you see parallel writes, because you see flush write of first wave to disk, in the same time of second wave write to journal. >>I totally aggree with you but this is just a test setup AND if you have >>a big log file to copy let's say 100GB your journal will never be big >>enough and the speed should never drop to 0MB/s. Also i see the correct >>behaviour with 3.0.X where the speed is maxed to the underlying device. >>So i still see no reason that with 3.4 the speed drops to 0MB/s and is >>mostly 10-20MB/s instead of 130MB/s. Maybe something is wrong with 3.4, then your disk write more slowly. (xfs bug, sata driver controller bug, ...) on my schema: Enough slowly to have the third wave to block on the journal. (so 0MB/S) maybe some local benchmark of your ssd with 3.4 can give some tips ? >> How many disks (7,2K) do you have by osd ? >>>One intel 520 SSD per OSD. I see some benchmark on internet about 150-300MB/s (depend of the blocksize). Something must be wrong, Doing local benchmark can really help I think. You can use sysbench-tools https://github.com/tsuna/sysbench-tools It make bench compare with nice graphs. ----- Mail original ----- De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> Cc: ceph-devel@xxxxxxxxxxxxxxx, "Mark Nelson" <mark.nelson@xxxxxxxxxxx> Envoyé: Lundi 28 Mai 2012 08:25:24 Objet: Re: poor OSD performance using kernel 3.4 Am 28.05.2012 07:37, schrieb Alexandre DERUMIER: > I think filestore journal parallel works only with btrfs. > Other filesystem are writeahead. ... you might be right but i can't change ceph's implementation. > if you write at 120MB/S, so your journal of 1GB is at 50% in 4sec. > > So you got around 480MB each 4sec, does your disks can flush sequentially these 480MB in less than 4sec ? > (do a small benchmark of your disk in local filesystem, without ceph) > > If not, you can have spikes in your write stats if the journal. > > simple schema if disks are not fast enough: I totally aggree with you but this is just a test setup AND if you have a big log file to copy let's say 100GB your journal will never be big enough and the speed should never drop to 0MB/s. Also i see the correct behaviour with 3.0.X where the speed is maxed to the underlying device. So i still see no reason that with 3.4 the speed drops to 0MB/s and is mostly 10-20MB/s instead of 130MB/s. > How many disks (7,2K) do you have by osd ? One intel 520 SSD per OSD. Stefan -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html