Re: 回复: Re: NewStore performance analysis

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 21, 2015 at 2:43 PM, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote:
> Hi Sage,
>         Well, that's
>                 submit_transaction -- submit a transaction , whether block waiting for fdatasync depends on rocksdb-disable-sync.
>                 submit_transaction_sync -- queue transaction and wait until it is stable on disk.
>         So if we default rocksdb-disable-sync to false, the two API are same. I haven't look at the LevelDB but I suspect it's similar.

Eh, I don't think it's the same. By default WriteOption.disableWAL is
false in our ceph side, and submit_transaction will use
WriteOption.sync=false and submit_transaction_sync will use
WriteOption.sync=true.

If sync==fase, rocksdb won't sync log file, otherwise it will call
fsync/fdatasync to flush log file.

Plz correct me if not. :-)

>
>         I just re-read the Newstore code, seems the workflow is not as that we want. We issue a bunch of submit_transaction and in the _kv_sync_thread we try to have a checkpoint that ensure previous transaction are persistent, by using submit_transcation_sync to submit an empty transaction.  But actually
>         1. the submit_transaction is already a synchronized call so the empty transcation in _kv_sync_thread is kind of waste.
>         2. An sync transaction cannot ensure the previous transaction is also synced. The API doesn't guarantee this, and from implementation, this two transactions may goes to different WAL files.
>
>         Yes, if we want, we can have a Queue and Thread that collecting the transactions and merge them to a big transaction , some ::fdatasync will be saved here. But this approach looks complex.
>
>         Some optimizations in my mind are:
>         1. Batch the cleanup operations in _apply_wal_transaction, we don’t need to synchronized remove the WAL item, we can just put them into kv_sync_thread_Q and let kv_sync_thread to form a batch transaction that deleted a bunch of key.
>         2. We don't need the empty transaction in kv_sync_thread, we could call the _txc_kv_finish_kv directly from _txc_submit_kv,  since the KV is synchronized.
>                 3.  Then we can rename _kv_sync_thread to _kv_cleanup_thread to better descript its work.
>
>         How do you think
>
>                                                                                                                         Xiaoxi
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> Sent: Tuesday, April 21, 2015 12:48 AM
> To: Chen, Xiaoxi
> Cc: Mark Nelson; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel
> Subject: Re: 回复: Re: NewStore performance analysis
>
> On Mon, 20 Apr 2015, Chen, Xiaoxi wrote:
>> > > An easy way to measure might be comment out
>> > > db->submit_transaction(txc->t); in NewStore::_txc_submit_kv, to
>> > > db->see if
>> > > we can get more QD in fragment part without issuing the DB.
>> >
>> > I'm not sure I totally understand the interface.. my assumption is
>> > that queue_transaction will give rocksdb the txn to commit whenever
>> > it finds it convenient (no idea what policy is used there) and
>> > queue_transaction_sync will trigger a commit now.  If we did have
>> > multiple threads doing queue_trandsaction_sync (by, say, calling it
>> > directly in _txc_submit_kv) would qa go up?
>> >
>> I think you might miss something, currently the two interface are
>> exactly the SAME unless you set rocksdb-disable-sync=true(which is
>> false by default).
>>
>> When commit, rocksdb will write the content to both memtable(write
>> buffer) and WAL. if the transaction doesnt go with sync, it will also
>> commit now,but the write to WAL will NOT be sync(by calling fdatasync).
>> That means we may lose data if power failure/kernel panic. This is why
>> i changed the default rocksdb-disable-sync from true to false in
>> previous patch.
>
> Yeah, I'm confused.  :)
>
> So now 'rocksdb disable sync = false', which seems to be obviously what we want for newstore.  It's different for filestore, which is doing a syncfs checkpoint.  Perhaps we should have newstore set that explicitly instead of passing through a config option.
>
> In any case, though, I'm confused by
>
>> if the transaction doesnt go with sync, it will also commit now,but
>> the write to WAL will NOT be sync(by calling fdatasync).
>
> What does it mean to 'commit' but not call fdatasync?  What does commit mean in this case?
>
> And, and I correct in understanding that we have
>
>  queue_transaction -- queue a transaction but don't block waiting for fdatasync  queue_transaction_sync -- queue transaction and wait until it is stable on disk
>
> to work with?
>
> Thanks!
> sage



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux