thank you Tommi we know we should use journal if we want keep data integrity, but it really influence the performance even we use btrfs as osd low-level fs, not fs for journal. in the debug log, find that : journal prepare_single_write 1 will write 3649536 : seq 445 len 4195462 -> 4202496 (head 40 pre_pad 3895 ebl 4195462 post_pad 3059 tail 40) (ebl alignment 3935) it will generate 4195462 bytes journal of writing 4MB data from client. that means we will write 8MB when it actually write 4MB data, so what advices you can give if we want to lower the overhead of journal? best regards! 2011/7/7 Tommi Virtanen <tommi.virtanen@xxxxxxxxxxxxx>: > [Re-added ceph-devel to Cc, it got dropped accidentally.] > > 2011/7/5 huang jun <hjwsm1989@xxxxxxxxx> >> >> thanks,Tommi >> now, we have solved this problem, but another occurs. >> if use osd_journal, the performance goes down heavily. >> we use rados to bench write performance. >> rados -p data bench 30 write >> use osd journal on ext3 : >> Total time run: 32.213428 >> Total writes made: 216 >> Write size: 4194304 >> Bandwidth (MB/sec): 26.821 >> >> Average Latency: 2.32526 >> Max latency: 3.6743 >> Min latency: 0.105108 >> >> and not use osd journal: >> Total time run: 31.057457 >> Total writes made: 452 >> Write size: 4194304 >> Bandwidth (MB/sec): 58.215 >> >> Average Latency: 1.07141 >> Max latency: 1.23193 >> Min latency: 0.927193 >> >> i can not figure out what slow down the write procedure, >> so can you give some direction/tips to find it out. > > Journal mode means you end up doing twice as many writes (once in the > journal, once in the final location). The writes to the journal will be > streaming writes, the writes to the final locations are most likely seeking > around the disk. In average use, the journal can consume a burst of writes, > and then do the final location writes at a more idle time. When your write > burst exceeds the size of the journal, you end up bottlenecked by the > writes, but now you have the writes to the journal in addition. The > abilities of different disk systems to handle this kind of writing varies a > lot. > Note that you need the journal for data integrity purposes, either way. > We are not yet focused on this kind of performance tuning, but it's highly > likely you will get better performance with btrfs. On btrfs, we can use the > internals of the filesystem to do the journaling for us. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html