On Tue, Apr 21, 2015 at 2:43 PM, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote: > Hi Sage, > Well, that's > submit_transaction -- submit a transaction , whether block waiting for fdatasync depends on rocksdb-disable-sync. > submit_transaction_sync -- queue transaction and wait until it is stable on disk. > So if we default rocksdb-disable-sync to false, the two API are same. I haven't look at the LevelDB but I suspect it's similar. Eh, I don't think it's the same. By default WriteOption.disableWAL is false in our ceph side, and submit_transaction will use WriteOption.sync=false and submit_transaction_sync will use WriteOption.sync=true. If sync==fase, rocksdb won't sync log file, otherwise it will call fsync/fdatasync to flush log file. Plz correct me if not. :-) > > I just re-read the Newstore code, seems the workflow is not as that we want. We issue a bunch of submit_transaction and in the _kv_sync_thread we try to have a checkpoint that ensure previous transaction are persistent, by using submit_transcation_sync to submit an empty transaction. But actually > 1. the submit_transaction is already a synchronized call so the empty transcation in _kv_sync_thread is kind of waste. > 2. An sync transaction cannot ensure the previous transaction is also synced. The API doesn't guarantee this, and from implementation, this two transactions may goes to different WAL files. > > Yes, if we want, we can have a Queue and Thread that collecting the transactions and merge them to a big transaction , some ::fdatasync will be saved here. But this approach looks complex. > > Some optimizations in my mind are: > 1. Batch the cleanup operations in _apply_wal_transaction, we don’t need to synchronized remove the WAL item, we can just put them into kv_sync_thread_Q and let kv_sync_thread to form a batch transaction that deleted a bunch of key. > 2. We don't need the empty transaction in kv_sync_thread, we could call the _txc_kv_finish_kv directly from _txc_submit_kv, since the KV is synchronized. > 3. Then we can rename _kv_sync_thread to _kv_cleanup_thread to better descript its work. > > How do you think > > Xiaoxi > -----Original Message----- > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > Sent: Tuesday, April 21, 2015 12:48 AM > To: Chen, Xiaoxi > Cc: Mark Nelson; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel > Subject: Re: 回复: Re: NewStore performance analysis > > On Mon, 20 Apr 2015, Chen, Xiaoxi wrote: >> > > An easy way to measure might be comment out >> > > db->submit_transaction(txc->t); in NewStore::_txc_submit_kv, to >> > > db->see if >> > > we can get more QD in fragment part without issuing the DB. >> > >> > I'm not sure I totally understand the interface.. my assumption is >> > that queue_transaction will give rocksdb the txn to commit whenever >> > it finds it convenient (no idea what policy is used there) and >> > queue_transaction_sync will trigger a commit now. If we did have >> > multiple threads doing queue_trandsaction_sync (by, say, calling it >> > directly in _txc_submit_kv) would qa go up? >> > >> I think you might miss something, currently the two interface are >> exactly the SAME unless you set rocksdb-disable-sync=true(which is >> false by default). >> >> When commit, rocksdb will write the content to both memtable(write >> buffer) and WAL. if the transaction doesnt go with sync, it will also >> commit now,but the write to WAL will NOT be sync(by calling fdatasync). >> That means we may lose data if power failure/kernel panic. This is why >> i changed the default rocksdb-disable-sync from true to false in >> previous patch. > > Yeah, I'm confused. :) > > So now 'rocksdb disable sync = false', which seems to be obviously what we want for newstore. It's different for filestore, which is doing a syncfs checkpoint. Perhaps we should have newstore set that explicitly instead of passing through a config option. > > In any case, though, I'm confused by > >> if the transaction doesnt go with sync, it will also commit now,but >> the write to WAL will NOT be sync(by calling fdatasync). > > What does it mean to 'commit' but not call fdatasync? What does commit mean in this case? > > And, and I correct in understanding that we have > > queue_transaction -- queue a transaction but don't block waiting for fdatasync queue_transaction_sync -- queue transaction and wait until it is stable on disk > > to work with? > > Thanks! > sage -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html