Hi, this is the second version of my patch series that prepares the reftable iterators to become re-seekable. These refactorings will eventually allow us to reuse data structures by iterators and optimize for certain cases. Changes compared to v1: - Various fixes to commit messages. - Fixed a copy & pasted comment to refer to logs instead of refs. The series continues to build on top of ps/reftable-write-optim. There is a merge conflict with the in-flight ps/reftable-write-options, but given that this series has not yet been merged to next and because Junio has already resolved the conflict in "seen" I decided to not pull it in as an additional dependency. Thanks! Patrick Patrick Steinhardt (13): reftable/block: use `size_t` to track restart point index reftable/reader: avoid copying index iterator reftable/reader: unify indexed and linear seeking reftable/reader: separate concerns of table iter and reftable reader reftable/reader: inline `reader_seek_internal()` reftable/reader: set up the reader when initializing table iterator reftable/merged: split up initialization and seeking of records reftable/merged: simplify indices for subiterators reftable/generic: move seeking of records into the iterator reftable/generic: adapt interface to allow reuse of iterators reftable/reader: adapt interface to allow reuse of iterators reftable/stack: provide convenience functions to create iterators reftable/merged: adapt interface to allow reuse of iterators refs/reftable-backend.c | 48 ++++---- reftable/block.c | 4 +- reftable/generic.c | 94 +++++++++++---- reftable/generic.h | 9 +- reftable/iter.c | 23 +++- reftable/merged.c | 148 ++++++++---------------- reftable/merged.h | 6 + reftable/merged_test.c | 19 ++- reftable/reader.c | 218 +++++++++++++++-------------------- reftable/readwrite_test.c | 35 ++++-- reftable/reftable-generic.h | 8 +- reftable/reftable-iterator.h | 21 ++++ reftable/reftable-merged.h | 15 --- reftable/reftable-reader.h | 45 ++------ reftable/reftable-stack.h | 18 +++ reftable/stack.c | 29 ++++- 16 files changed, 378 insertions(+), 362 deletions(-) Range-diff against v1: 1: ca86a8b58d = 1: ca86a8b58d reftable/block: use `size_t` to track restart point index 2: bdbebdbd36 = 2: bdbebdbd36 reftable/reader: avoid copying index iterator 3: 716863a580 = 3: 716863a580 reftable/reader: unify indexed and linear seeking 4: 91db2f18c1 = 4: 91db2f18c1 reftable/reader: separate concerns of table iter and reftable reader 5: 4d498ef342 = 5: 4d498ef342 reftable/reader: inline `reader_seek_internal()` 6: 5a10a11584 = 6: 5a10a11584 reftable/reader: set up the reader when initializing table iterator 7: 21b3e3ab5f ! 7: 12c10fd366 reftable/merged: split up initialization and seeking of records @@ Commit message To initialize a `struct merged_iter`, we need to seek all subiterators to the wanted record and then add their results to the priority queue used to sort the records. This logic is split up across two functions, - `merged_table_seek_record()` and `merged_table_iter()`. The scope of - these functions is somewhat weird though, where `merged_table_iter()` is + `merged_table_seek_record()` and `merged_iter_init()`. The scope of + these functions is somewhat weird though, where `merged_iter_init()` is only responsible for adding the records of the subiterators to the priority queue. - Clarify the scope of those functions such that `merged_table_iter()` is + Clarify the scope of those functions such that `merged_iter_init()` is only responsible for initializing the iterator's structure. Performing the subiterator seeks are now part of `merged_table_seek_record()`. 8: f0f42cd56b ! 8: 31bdfc1e34 reftable/merged: simplify indices for subiterators @@ Commit message reftable/merged: simplify indices for subiterators When seeking on a merged table, we perform the seek for each of the - subiterators. If the subiterator hasa the desired record we add it to - the priority queue, otherwise we skip it and don't add it to the stack - of subiterators hosted by the merged table. + subiterators. If the subiterator has the desired record we add it to the + priority queue, otherwise we skip it and don't add it to the stack of + subiterators hosted by the merged table. The consequence of this is that the index of the subiterator in the merged table does not necessarily correspond to the index of it in the 9: 859b399e71 ! 9: 3e91c3830e reftable/generic: move seeking of records into the iterator @@ Commit message The second and more significant downside is that it is impossible to reuse iterators for multiple seeks. Whenever you want to look up a record, you need to re-create the whole infrastructure again, which is - quite a waste of time. Furthermore, it is impossible to for example - optimize seeks, for example when seeking the same record multiple times. + quite a waste of time. Furthermore, it is impossible to optimize seeks, + such as when seeking the same record multiple times. To address this, we essentially split up the concerns properly such that the parent data structure is responsible for setting up the iterator via 10: 727b8fa432 = 10: b0641dd800 reftable/generic: adapt interface to allow reuse of iterators 11: f688f8f59a ! 11: 0745951650 reftable/reader: adapt interface to allow reuse of iterators @@ reftable/reftable-reader.h: struct reftable_table; +void reftable_reader_init_ref_iterator(struct reftable_reader *r, + struct reftable_iterator *it); + -+/* Initialize a reftable iterator for reading refs. */ ++/* Initialize a reftable iterator for reading logs. */ +void reftable_reader_init_log_iterator(struct reftable_reader *r, + struct reftable_iterator *it); 12: 68cc35919b = 12: 51480c4328 reftable/stack: provide convenience functions to create iterators 13: be4da295c6 = 13: 2586e93c44 reftable/merged: adapt interface to allow reuse of iterators -- 2.45.GIT
Attachment:
signature.asc
Description: PGP signature