On Thu, Feb 01, 2024 at 07:17:45AM -0800, Karthik Nayak wrote: > Hello, > > Patrick Steinhardt <ps@xxxxxx> writes: > > - It becomes possible to do truly atomic writes where either all refs > > are committed to disk or none are. This was not possible with the > > "files" backend because ref updates were split across multiple loose > > files. > > > > - The disk space required to store many refs is reduced, both compared > > to loose refs and packed-refs. This is enabled both by the reftable > > format being a binary format, which is more compact, and by prefix > > compression. > > > > - We can ignore filesystem-specific behaviour as ref names are not > > encoded via paths anymore. This means there is no need to handle > > case sensitivity on Windows systems or Unicode precomposition on > > macOS. > > > > - There is no need to rewrite the complete refdb anymore every time a > > ref is being deleted like it was the case for packed-refs. This > > means that ref deletions are now constant time instead of scaling > > linearly with the number of refs. > > > > - We can ignore file/directory conflicts so that it becomes possible > > to store both "refs/heads/foo" and "refs/heads/foo/bar". > > > > - Due to this property we can retain reflogs for deleted refs. We have > > previously been deleting reflogs together with their refs to avoid > > file/directory conflicts, which is not necessary anymore. > > > > Nit: Maybe also a point about how with the current files backend doesn't > have a good way to differentiate between regular files and refs. While > regular refs are in the 'refs/' folder, pseudorefs have no namespace. > While with the reftable implementation we know that everything within > the reftable is a ref and there are no other refs we need to consider. True, I'll add that. > > Performance-wise things very much depend on the actual workload. The > > following benchmarks compare the "files" and "reftable" backends in the > > current version: > > Pretty nice to have these numbers here. > > > > > - Creating N refs in separate transactions shows that the "files" > > backend is ~50% faster. This is not surprising given that creating a > > ref only requires us to create a single loose ref. The "reftable" > > backend will also perform auto compaction on updates. In real-world > > workloads we would likely also want to perform pack loose refs, > > which would likely change the picture. > > > > Benchmark 1: update-ref: create refs sequentially (refformat = files) > > Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] > > Range (min … max): 1.8 ms … 4.3 ms 133 runs > > > > Benchmark 2: update-ref: create refs sequentially (refformat = reftable) > > Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] > > Range (min … max): 2.4 ms … 2.9 ms 132 runs > > > > Benchmark 3: update-ref: create refs sequentially (refformat = files) > > Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] > > Range (min … max): 1.969 s … 1.980 s 3 runs > > > > Benchmark 4: update-ref: create refs sequentially (refformat = reftable) > > Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] > > Range (min … max): 2.597 s … 2.622 s 3 runs > > > > Benchmark 5: update-ref: create refs sequentially (refformat = files) > > Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] > > Range (min … max): 198.189 s … 198.670 s 3 runs > > > > Benchmark 6: update-ref: create refs sequentially (refformat = reftable) > > Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] > > Range (min … max): 290.223 s … 298.761 s 3 runs > > > > Nit: The refcount is missing in these benchmarks. Good catch, this is an edge case of hyperfine. Because the refcount is used in the command line it's not shown anymore in its output. I'll just massage these manually. > > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c > > new file mode 100644 > > index 0000000000..895de0b273 > > --- /dev/null > > +++ b/refs/reftable-backend.c > > @@ -0,0 +1,2286 @@ > > +#include "../git-compat-util.h" > > +#include "../abspath.h" > > +#include "../chdir-notify.h" > > +#include "../config.h" > > I believe this header can be dropped. > > Not sure about removing headers which aren't needed. The coding > guidelines mentions that we should include headers that declare the > functions and the types used. I found some headers here which are not > required considering this rule. It's always good hygiene to not include unneeded headers. Also helps to keep build times lower and reduce unnecessary rebuilds. > > +#include "../environment.h" > > +#include "../gettext.h" > > +#include "../hash.h" > > +#include "../hex.h" > > +#include "../iterator.h" > > +#include "../ident.h" > > +#include "../lockfile.h" > > +#include "../object.h" > > +#include "../path.h" > > +#include "../refs.h" > > +#include "../reftable/reftable-stack.h" > > +#include "../reftable/reftable-record.h" > > +#include "../reftable/reftable-error.h" > > +#include "../reftable/reftable-blocksource.h" > > This header can be dropped too. > > > +#include "../reftable/reftable-reader.h" > > This also. > > > +#include "../reftable/reftable-iterator.h" > > +#include "../reftable/reftable-merged.h" > > +#include "../reftable/reftable-generic.h" > > This one as well. > > > +#include "../setup.h" > > +#include "../strmap.h" > > +#include "../worktree.h" > > This one too. > > > +struct reftable_ref_store { > > + struct ref_store base; > > + > > + struct reftable_stack *main_stack; > > + struct reftable_stack *worktree_stack; > > + struct strmap worktree_stacks; > > + struct reftable_write_options write_options; > > + > > + unsigned int store_flags; > > + int err; > > +}; > > So, I'm assuming that `main_stack` here would be the primary reference > db, even when inside the worktree. I'm wondering why we can't just have > > struct reftable_stack *current_stack; > struct strmap worktree_stacks; > > Reading further on, it becomes necessary to have both, because the user > could > 1. Request main-worktree/{ref} > 2. Request worktrees/{worktree}/{ref} > So it's important that we have access to both the main worktree, the > current worktree and also any other worktree. Indeed. I'll add some comments to point out their respective uses. > Maybe we could just drop the `struct reftable_stack *worktree_stack` and > rely on the map entirely, but I guess `worktree_stack` acts as a caching > layer and avoids using the map unless necessary. I doubt that the map access would be all that expensive in practice. The problem rather is that we don't know about the worktree name when we init the backend, and looking it up feels fragile to me and may involve parsing all worktree metadata. So I rather want to avoid that to the extent possible. > > + base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable); > > + strmap_init(&refs->worktree_stacks); > > + refs->store_flags = store_flags; > > + refs->write_options.block_size = 4096; > > > > Perhaps use `DEFAULT_BLOCK_SIZE` from "reftable/constants.h" here. This isn't exposed given that only the "reftable/reftable-*.h" headers are considered to be public. Ideally, we'd just leave it unset and let the default block size kick in. But `reftable_new_stack()` doesn't set up the block size at all in that case, and that leads to failures. I'll leave this as-is for now. We can clean it up in the future. > > + strbuf_addstr(&path, "/reftable"); > > + refs->err = reftable_new_stack(&refs->main_stack, path.buf, > > + refs->write_options); > > + if (refs->err) > > + goto done; > > + > > + /* > > + * If we're in a worktree we also need to set up the worktree reftable > > + * stack that is contained in the per-worktree GIT_DIR. > > + */ > > + if (is_worktree) { > > + strbuf_reset(&path); > > + strbuf_addf(&path, "%s/reftable", gitdir); > > + > > + refs->err = reftable_new_stack(&refs->worktree_stack, path.buf, > > + refs->write_options); > > + if (refs->err) > > + goto done; > > + } > > Wondering if we should also add this to the `refs->worktree_stacks`. Ideally we would, but at this point we have no easy way to figure out the worktree name to the best of my knowledge. > > +struct reftable_ref_iterator { > > + struct ref_iterator base; > > + struct reftable_ref_store *refs; > > + struct reftable_iterator iter; > > + struct reftable_ref_record ref; > > + struct object_id oid; > > + > > + const char *prefix; > > + unsigned int flags; > > + int err; > > +}; > > So the `flags` in this structure are for the iterator itself but the > `flags` in `base` are flags related to the current ref stored in `base`. Yup. > > +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator) > > +{ > > + struct reftable_ref_iterator *iter = > > + (struct reftable_ref_iterator *)ref_iterator; > > + struct reftable_ref_store *refs = iter->refs; > > + > > + while (!iter->err) { > > + int flags = 0; > > + > > Nit: I think the syntax in `reftable_reflog_iterator_advance` is similar but > nicer to read, the usage of `while(1)` and returning within the while > loop is better than breaking here and returning outside, either ways, I > think it'd be nicer to make both of them consistent. The problem here is that the ref iterator has multiple exit paths that need to be treated the same, so it is easier to have a common exit path. I'll thus touch up the reflog iterator to look the same. I think that's the better option anyway so that there is a single exit path, only. Also shows that we didn't store the error into `iter->err` in the reflog case. Not a huge issue because callers aren't supposed to continue iterating after they've hit an error anyway. But still preferable to make errors persistent. > > + iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref); > > + if (iter->err) > > + break; > > + > > + /* > > + * The files backend only lists references contained in > > + * "refs/". We emulate the same behaviour here and thus skip > > + * all references that don't start with this prefix. > > + */ > > + if (!starts_with(iter->ref.refname, "refs/")) > > + continue; > > + > > Since my patch series [1] to print all refs is now merged to next, maybe > you could add this in? > > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c > index 895de0b273..3f4f905292 100644 > --- a/refs/reftable-backend.c > +++ b/refs/reftable-backend.c > @@ -348,11 +348,10 @@ static int > reftable_ref_iterator_advance(struct ref_iterator *ref_iterator) > break; > > /* > - * The files backend only lists references contained in > - * "refs/". We emulate the same behaviour here and thus skip > - * all references that don't start with this prefix. > + * Unless the `DO_FOR_EACH_INCLUDE_ALL_REFS` flag is use, we only > + * list references contained in "refs/" to mimic the file-backend. > */ > - if (!starts_with(iter->ref.refname, "refs/")) > + if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) && > !starts_with(iter->ref.refname, "refs/")) > continue; > > if (iter->prefix && I'm still waiting for your patch series to hit `next` before making it a dependency of this series, but thanks for the patch! > > +static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree, > > + struct ref_iterator *iter_common, > > + void *cb_data UNUSED) > > +{ > > + if (iter_worktree && !iter_common) { > > + /* > > + * Return the worktree ref if there are no more common refs. > > + */ > > + return ITER_SELECT_0; > > + } else if (iter_common) { > > + /* > > + * In case we have pending worktree and common refs we need to > > + * yield them based on their lexicographical order. Worktree > > + * refs that have the same name as common refs shadow the > > + * latter. > > + */ > > + if (iter_worktree) { > > + int cmp = strcmp(iter_worktree->refname, > > + iter_common->refname); > > + if (cmp < 0) > > + return ITER_SELECT_0; > > + else if (!cmp) > > + return ITER_SELECT_0_SKIP_1; > > + } > > + > > + /* > > + * Otherwise, if we either have no worktree refs anymore or if > > + * the common ref sorts before the next worktree ref, we need > > + * to figure out whether the next common ref belongs to the > > + * main worktree. In that case, it should be ignored. > > + */ > > + if (parse_worktree_ref(iter_common->refname, NULL, NULL, > > + NULL) == REF_WORKTREE_SHARED) > > + return ITER_SELECT_1; > > + > > I'm not sure I understand this. When would this situation occur? There are two cases here: - The worktree iterator is exhausted because there are common refs which sort lexicographically after all worktree refs. - Both iterators still have refs, and the next lexicographic ref is from the common ref store. In both case we want to return only those refs which are shared, but we want to skip over all per-worktree refs from the common ref store. This is because the per-worktree refs from the common ref store belong to the main worktree, not to ours. I'll rework the comments a bit. > > + return ITER_SKIP_1; > > + } else { > > + return ITER_DONE; > > + } > > +} > > + > > + > > +static int reftable_be_transaction_prepare(struct ref_store *ref_store, > > + struct ref_transaction *transaction, > > + struct strbuf *err) > > +{ > > + struct reftable_ref_store *refs = > > + reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare"); > > + struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT; > > + struct string_list affected_refnames = STRING_LIST_INIT_NODUP; > > + struct reftable_transaction_data *tx_data = NULL; > > + struct object_id head_oid; > > + unsigned int head_type = 0; > > + size_t i; > > + int ret; > > + > > + ret = refs->err; > > + if (ret < 0) > > + goto done; > > + > > + tx_data = xcalloc(1, sizeof(*tx_data)); > > + > > + /* > > + * Preprocess all updates. For one we check that there are no duplicate > > + * reference updates in this transaction. Second, we lock all stacks > > + * that will be modified during the transaction. > > + */ > > + for (i = 0; i < transaction->nr; i++) { > > + ret = prepare_transaction_update(NULL, refs, tx_data, > > + transaction->updates[i], err); > > + if (ret) > > + goto done; > > + > > + string_list_append(&affected_refnames, > > + transaction->updates[i]->refname); > > + } > > + > > + /* > > + * Now that we have counted updates per stack we can preallocate their > > + * arrays. This avoids having to reallocate many times. > > + */ > > + for (i = 0; i < tx_data->args_nr; i++) { > > + CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected); > > + tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected; > > + } > > + > > + /* > > + * Fail if a refname appears more than once in the transaction. > > + * This code is taken from the files backend and is a good candidate to > > + * be moved into the generic layer. > > + */ > > + string_list_sort(&affected_refnames); > > + if (ref_update_reject_duplicates(&affected_refnames, err)) { > > + ret = TRANSACTION_GENERIC_ERROR; > > + goto done; > > + } > > + > > + ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid, > > + &head_referent, &head_type); > > + if (ret < 0) > > + goto done; > > + > > + for (i = 0; i < transaction->nr; i++) { > > + struct ref_update *u = transaction->updates[i]; > > + struct object_id current_oid = {0}; > > + struct reftable_stack *stack; > > + const char *rewritten_ref; > > + > > + stack = stack_for(refs, u->refname, &rewritten_ref); > > > > Wondering why we didn't just iterate over `tx_data.args` and > `args[i].updates` since there we already have the required stack. > > Update: we don't save rewritten_ref in tx_data, so that's one blocker > for this. Right, that's the reason. I was trying to do that, but also ran into some other problems caused by the ref splitting we do in the loop. > > + > > + if (u->type & REF_ISSYMREF) { > > + const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0, > > + ¤t_oid, NULL); > > + > > + if (u->flags & REF_NO_DEREF) { > > + /* > > + * The reftable stack is locked at this point > > + * already, so it should be safe to call > > + * `refs_resolve_ref_unsafe()` here. > > + */ > > Shouldn't this comment be a few lines before? Good catch. I moved the `refs_resolve_ref_unsafe()` call at one point and forgot to also move the comment. > > +static int write_transaction_table(struct reftable_writer *writer, void *cb_data) > > +{ > > + struct write_transaction_table_arg *arg = cb_data; > > + struct reftable_merged_table *mt = > > + reftable_stack_merged_table(arg->stack); > > + uint64_t ts = reftable_stack_next_update_index(arg->stack); > > + struct reftable_log_record *logs = NULL; > > + size_t logs_nr = 0, logs_alloc = 0, i; > > + int ret = 0; > > + > > + QSORT(arg->updates, arg->updates_nr, transaction_update_cmp); > > + > > + reftable_writer_set_limits(writer, ts, ts); > > + > > + for (i = 0; i < arg->updates_nr; i++) { > > + struct reftable_transaction_update *tx_update = &arg->updates[i]; > > + struct ref_update *u = tx_update->update; > > + > > + /* > > + * Write a reflog entry when updating a ref to point to > > + * something new in either of the following cases: > > + * > > + * - The reference is about to be deleted. We always want to > > + * delete the reflog in that case. > > + * - REF_FORCE_CREATE_REFLOG is set, asking us to always create > > + * the reflog entry. > > + * - `core.logAllRefUpdates` tells us to create the reflog for > > + * the given ref. > > + */ > > + if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) { > > + struct reftable_log_record log = {0}; > > + struct reftable_iterator it = {0}; > > + > > + /* > > + * When deleting refs we also delete all reflog entries > > + * with them. While it is not strictly required to > > + * delete reflogs together with their refs, this > > + * matches the behaviour of the files backend. > > + * > > + * Unfortunately, we have no better way than to delete > > + * all reflog entries one by one. > > + */ > > + ret = reftable_merged_table_seek_log(mt, &it, u->refname); > > + while (ret == 0) { > > + struct reftable_log_record *tombstone; > > + > > + ret = reftable_iterator_next_log(&it, &log); > > + if (ret < 0) > > + break; > > + if (ret > 0 || strcmp(log.refname, u->refname)) { > > + ret = 0; > > + break; > > + } > > + > > + ALLOC_GROW(logs, logs_nr + 1, logs_alloc); > > + tombstone = &logs[logs_nr++]; > > + tombstone->refname = xstrdup(u->refname); > > + tombstone->value_type = REFTABLE_LOG_DELETION, > > Why is this a comma? one other such instance in this file. Typo :) Funny enough this is fine and doesn't make any difference. The comma operator will simply discard the value of the first statement (which is the assigned value) and instead return the value of the second value. We don't use either though, so no problem. I've also fixed the second instance. [snip] > > + /* > > + * When deleting the old branch we need to create a reflog entry on the > > + * new branch name that indicates that the old branch has been deleted > > + * and then recreated. This is a tad weird, but matches what the files > > + * backend does. > > + */ > > + if (arg->delete_old) { > > + struct strbuf head_referent = STRBUF_INIT; > > + struct object_id head_oid; > > + int append_head_reflog; > > + unsigned head_type = 0; > > + > > + ALLOC_GROW(logs, logs_nr + 1, logs_alloc); > > + memset(&logs[logs_nr], 0, sizeof(logs[logs_nr])); > > + fill_reftable_log_record(&logs[logs_nr]); > > + logs[logs_nr].refname = (char *)arg->newname; > > + logs[logs_nr].update_index = deletion_ts; > > + logs[logs_nr].value.update.message = > > + xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2); > > Question: here and other places in this file, shouldn't we free memory > allocated by `xstrndup`? We use `reftable_log_record_release()`, which takes care of this. > Also, why is it `block_size / 2` ? That's a very good question indeed. The problem we have here is that the maximum length of a record is limited by the block size. Thus, we cannot write entries into a block that exceed a certain length and we must ensure that no record exceeds that size. Using `block_size / 2` is thus a mechanism to cull the reflog message so that we can at least write a _partial_ reflog without breaking the format. The division by two is kind of arbitrary here and ensures that we have enough headroom to also write the remaining data of the record. So this could be improved to instead trim at the exact block boundary minus the overhead of the remaining data. But I dobut that's really worth it for now given that nobody will ever write a 2kB reflog entry (famous last words). > > +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs, > > + struct reftable_stack *stack) > > +{ > > + struct reftable_reflog_iterator *iter; > > + struct reftable_merged_table *mt; > > + int ret; > > + > > + iter = xcalloc(1, sizeof(*iter)); > > + base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1); > > + iter->refs = refs; > > + iter->base.oid = &iter->oid; > > + > > + ret = reftable_stack_reload(refs->main_stack); > > + if (ret < 0) > > + goto done; > > + > > + mt = reftable_stack_merged_table(stack); > > + ret = reftable_merged_table_seek_log(mt, &iter->iter, ""); > > + if (ret < 0) > > + goto done; > > + > > Keeping it similar to `ref_iterator_for_stack`, perhaps: > > ret = refs->err; > if (ret) > goto done; > > ret = reftable_stack_reload(stack); > if (ret) > goto done; > > merged_table = reftable_stack_merged_table(stack); > > ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix); > if (ret) > goto done; Will do. > > +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store, > > + const char *refname, > > + each_reflog_ent_fn fn, > > + void *cb_data) > > +{ > > + struct reftable_ref_store *refs = > > + reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse"); > > + struct reftable_stack *stack = stack_for(refs, refname, &refname); > > + struct reftable_merged_table *mt = NULL; > > + struct reftable_log_record log = {0}; > > + struct reftable_iterator it = {0}; > > + int ret; > > + > > + if (refs->err < 0) > > + return refs->err; > > Nit: seems like this function and the one below > `reftable_be_for_each_reflog_ent`, share the same code apart from > iteration direction. Wonder if it'd be nicer to extract out the common > code from them. Hm. I tried to do this once a while ago, but didn't end up with a nice solution because I basically tried to unify the iterator loops, which didn't quite work out in a nice way. But coming back with fresh eyes we can at least easily share the code that yields the log record to the callback function to remove some code duplication. > > +static int reftable_be_reflog_exists(struct ref_store *ref_store, > > + const char *refname) > > +{ > > + struct reftable_ref_store *refs = > > + reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists"); > > + struct reftable_stack *stack = stack_for(refs, refname, &refname); > > + struct reftable_merged_table *mt = reftable_stack_merged_table(stack); > > + struct reftable_log_record log = {0}; > > + struct reftable_iterator it = {0}; > > + int ret; > > + > > + ret = refs->err; > > + if (ret < 0) > > + goto done; > > + > > + ret = reftable_stack_reload(stack); > > + if (ret) > > + goto done; > > + > > + ret = reftable_merged_table_seek_log(mt, &it, refname); > > + if (ret) > > + goto done; > > + > > + /* > > + * Seek the reflog to see whether it contains any reflog entries which > > + * aren't marked for deletion. > > + */ > > Shouldn't we be checking `log.value_type` in that case? I think this whole loop doesn't make any sense anymore. I initially misunderstood how reflog deletion works [1]: I assumed that a reflog entry with the zero object ID as new value is supposed to mark a reflog as deleted, but that's not the case. Instead, deleting a reflog requires us to mark each record as deleted individually. Given that the reftable stack is marked to suppress deletions (see `reftable_stack_reload_once()`, `new_merged->suppress_deletions = 1`), we don't have to check for deletions at all. All we have to do is seek the log once and check whether we get a log record matching the expected name. [1]: https://public-inbox.org/git/ZVy0zKcmc8tjmgzs@tanuki/ > This email only reviews this file, I'll review the others in follow up > email[s]. Thanks a lot for your review! I know the code here is quite dense, so I really appreciate it. Patrick
Attachment:
signature.asc
Description: PGP signature