Hi Sage, thanks for your response. >>If you turn off the journal compeletely, you will see bursty write commits >>from the perspective of the client, because the OSD is periodically doing >>a sync or snapshot and only acking the writes then. >>If you enable the journal, the OSD will reply with a commit as soon as the >>write is stable in the journal. That's one reason why it is there--file >>system commits of heavyweight and slow. Yes of course, I don't wan't to desactivate journal, using a journal on a fast ssd or nvram is the right way. >>If we left the file system to its own devices and did a sync every 10 >>seconds, the disk would sit idle while a bunch of dirty data accumulated >>in cache, and then the sync/snapshot would take a really long time. This >>is horribly inefficient (the disk is idle half the time), and useless (the >>delayed write behavior makes sense for local workloads, but not servers >>where there is a client on the other end batching its writes). To prevent >>this, 'filestore flusher' will prod the kernel to flush out any written >>data to the disk quickly. Then, when we get around to doing the >>sync/snapshot it is pretty quick, because only fs metadata and >>just-written data needs to be flushed. mmm, I disagree. If you flush quickly, it's works fine with sequential write workload. But if you have a lot of random write with 4k block by exemple, you are going to have a lot of disk seeks. The way zfs works or netapp san storage works, they take random writes in a fast journal then flush them sequentially each 20s to slow storage. To compare with zfs or netapp, I can achieve around 20000io/s on random write 4K with 4GB nvram and 10 x 7200 disk. with ceph, i'm around 2000io/s with same config. (3 nodes with 10x7200disk, 2x replication), so around real disk io limit without any write cache. So for now, i'm think i'm going to use ssd for my osds,I have 80% random write workload. (no seeks, so no problem to constant random write) NTW: maybe wiki is wrong http://ceph.com/wiki/OSD_journal section Motivation "Enterprise products like NetApp filers "cheat" by journaling all writes to NVRAM and then taking their time to flush things out to disk efficiently. This gives you very low-latency writes _and_ efficient disk IO at the expense of hardware." This why I thinked ceph worked like this. Thanks again, -Alexandre ----- Mail original ----- De: "Sage Weil" <sage@xxxxxxxxxxx> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> Cc: ceph-devel@xxxxxxxxxxxxxxx, "Mark Nelson" <mark.nelson@xxxxxxxxxxx>, "Stefan Priebe" <s.priebe@xxxxxxxxxxxx> Envoyé: Jeudi 21 Juin 2012 18:03:45 Objet: Re: filestore flusher = false , correct my problem of constant write (need info on this parameter) Hi Alexandre, [Sorry I didn't follow up earlier; I didn't understand your question.] If you turn off the journal compeletely, you will see bursty write commits from the perspective of the client, because the OSD is periodically doing a sync or snapshot and only acking the writes then. If you enable the journal, the OSD will reply with a commit as soon as the write is stable in the journal. That's one reason why it is there--file system commits of heavyweight and slow. If we left the file system to its own devices and did a sync every 10 seconds, the disk would sit idle while a bunch of dirty data accumulated in cache, and then the sync/snapshot would take a really long time. This is horribly inefficient (the disk is idle half the time), and useless (the delayed write behavior makes sense for local workloads, but not servers where there is a client on the other end batching its writes). To prevent this, 'filestore flusher' will prod the kernel to flush out any written data to the disk quickly. Then, when we get around to doing the sync/snapshot it is pretty quick, because only fs metadata and just-written data needs to be flushed. So: the behavior you're seeing is normal, and good. Did I understand your confusion correctly? Thanks! sage On Wed, 20 Jun 2012, Alexandre DERUMIER wrote: > Hi, > I have tried to disabe filestore flusher > > filestore flusher = false > filestore max sync interval = 30 > filestore min sync interval = 29 > > > in osd config. > > > now, I see correct sync each 30s when doing rados bench > > rados -p pool3 bench 60 write -t 16 > > > seekwatcher movie: > > > before > ------ > http://odisoweb1.odiso.net/seqwrite-radosbench-flusherenable.mpg > > after > ----- > http://odisoweb1.odiso.net/seqwrite-radosbench-flusherdisable.mpg > > > Shouldn't it be the normal behaviour ? What's exactly is filestore flusher vs syncfs ? > > > > This seem to works fine with rados bench, > But when I launch benchmark with fio from my guest vm, I see again constant write. > (I'll try to debug that today) > > > My target is to be able to handle small random write and write them each 30s. > > Regards, > > Alexandre > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html