Re: speedup ceph / scaling / find the bottleneck

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Tue, 03 Jul 2012 06:42:42 +0200 (CEST)

Stefan,

As fio benchmark use directio (--direct) , maybe the writeback cache is not working ?

perfcounters should give us the answer.

----- Mail original ----- 

De: "Josh Durgin" <josh.durgin@xxxxxxxxxxx> 
À: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx> 
Cc: "Gregory Farnum" <greg@xxxxxxxxxxx>, "Alexandre DERUMIER" <aderumier@xxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx, "Mark Nelson" <mark.nelson@xxxxxxxxxxx> 
Envoyé: Lundi 2 Juillet 2012 22:30:19 
Objet: Re: speedup ceph / scaling / find the bottleneck 

On 07/02/2012 12:22 PM, Stefan Priebe wrote: 
> Am 02.07.2012 18:51, schrieb Gregory Farnum: 
>> On Sun, Jul 1, 2012 at 11:12 PM, Stefan Priebe - Profihost AG 
>> <s.priebe@xxxxxxxxxxxx> wrote: 
>>> @sage / mark 
>>> How does the aggregation work? Does it work 4MB blockwise or target node 
>>> based? 
>> Aggregation is based on the 4MB blocks, and if you've got caching 
>> enabled then it's also not going to flush them out to disk very often 
>> if you're continuously updating the block — I don't remember all the 
>> conditions, but essentially, you'll run into dirty limits and it will 
>> asynchronously flush out the data based on a combination of how old it 
>> is, and how long it's been since some version of it was stable on 
>> disk. 
> Is there any way to check if rbd caching works correctly? For me the I/O 
> values do not change if i switch writeback on or of and it also doesn't 
> matter how large i set the cache size. 
> 
> ... 

If you add admin_socket=/path/to/admin_socket for your client running 
qemu (in that client's ceph.conf section or manually in the qemu 
command line) you can check that caching is enabled: 

ceph --admin-daemon /path/to/admin_socket show config | grep rbd_cache 

And see statistics it generates (look for cache) with: 

ceph --admin-daemon /path/to/admin_socket perfcounters_dump 

Josh 

>>> Ceph: 
>>> 2 VMs: 
>>> write: io=2234MB, bw=25405KB/s, iops=6351, runt= 90041msec 
>>> read : io=4760MB, bw=54156KB/s, iops=13538, runt= 90007msec 
>>> write: io=56372MB, bw=638402KB/s, iops=155, runt= 90421msec 
>>> read : io=86572MB, bw=981225KB/s, iops=239, runt= 90346msec 
>>> 
>>> write: io=2222MB, bw=25275KB/s, iops=6318, runt= 90011msec 
>>> read : io=4747MB, bw=54000KB/s, iops=13500, runt= 90008msec 
>>> write: io=55300MB, bw=626733KB/s, iops=153, runt= 90353msec 
>>> read : io=84992MB, bw=965283KB/s, iops=235, runt= 90162msec 
>> 
>> I can't quite tell what's going on here, can you describe the test in 
>> more detail? 
> 
> I've network booted my VM and then run the following command: 
> export DISK=/dev/vda; (fio --filename=$DISK --direct=1 --rw=randwrite 
> --bs=4k --size=200G --numjobs=50 --runtime=90 --group_reporting 
> --name=file1;fio --filename=$DISK --direct=1 --rw=randread --bs=4k 
> --size=200G --numjobs=50 --runtime=90 --group_reporting --name=file1;fio 
> --filename=$DISK --direct=1 --rw=write --bs=4M --size=200G --numjobs=50 
> --runtime=90 --group_reporting --name=file1;fio --filename=$DISK 
> --direct=1 --rw=read --bs=4M --size=200G --numjobs=50 --runtime=90 
> --group_reporting --name=file1 )|egrep " read| write" 
> 
> - write random 4k I/O 
> - read random 4k I/O 
> - write seq 4M I/O 
> - read seq 4M I/O 
> 
> Stefan 

-- 

-- 

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 

Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 

45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html