Re: Example of RADOS AIO?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage, 

Thanks for your advice. Yes, I was talking about write-ahead journal, and it was good to know synchronous commit can be faster in some cases. 

> If you have a writeahead journal,
> the commit always comes first.  And if you have a fast journal device
> (SSD, NVRAM, RAID card with NVRAM, etc.) it will also be predictably fast.
> 
> And if it's a journal you rely on for consistency, I'd be very careful
> about relaxing your safety guarantees!

I know what you mean. But, I'll let HBase users to choose speed v.s. safety tradeoff on each write. 

Right now, HBase users can specify the level of consistency guarantees on each write. They can choose either writing journal to the in-memory buffer of HDFS or not writing journal at all. There is no option to commit journal to the disk on each write because HDFS doesn't have necessary feature to do that. HDFS only flushes the buffer to disk when the file is closed or the file size reaches to the block size (64MB)  We think it's still safe because there are 3 replicas in HDFS memory. 

But with Ceph / RADOS, I could create another option to commit journal to the disk for the maximum safety. And if it runs faster in some cases, that could be a great option. 

Thanks, 
Tatsuya 



2010/12/20 Sage Weil <sage@xxxxxxxxxxxx>:
> On Sun, 19 Dec 2010, Tatsuya Kawano wrote:
>> Is there any example of asynchronous IO on RADOS (librados)? When I
>> write some bytes to an object, I'm only interested on the ack, which
>> means writes are applied to all the OSDs within the same placement group
>> but not yet committed to the disks. I'll use this feature to write HBase
>> journals.
>
> There is an example in the rados_bencher.h file, and in
> testrados[pp].c[c].
>
> One thing to keep in mind, though, is that in some cases commit is
> actually faster.  Sometimes writing to the fs can be bursty (latency
> spikes while btrfs is doing a commit).  If you have a writeahead journal,
> the commit always comes first.  And if you have a fast journal device
> (SSD, NVRAM, RAID card with NVRAM, etc.) it will also be predictably fast.
>
> And if it's a journal you rely on for consistency, I'd be very careful
> about relaxing your safety guarantees!
>
> sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux