On 7 Jan 2022, at 12:16, René Scharfe <l.s.r@xxxxxx> wrote: > > Am 06.01.22 um 23:53 schrieb Junio C Hamano: >> Jessica Clarke <jrtc27@xxxxxxxxxx> writes: >> >>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as >>> hardware capabilities which, as well as having a normal integer address, >>> have additional bounds, permissions and other metadata in a second word. >>> In order to preserve this metadata, uintptr_t is also implemented as a >>> capability, not a plain integer, which causes problems for binary >>> operators, as the metadata preserved in the output can only come from >>> one of the inputs. In most cases this is clear, as normally at least one >>> operand is provably a plain integer, but if both operands are uintptr_t >>> and have no indication they're just plain integers then it is ambiguous, >>> and the current implementation will arbitrarily, but deterministically, >>> pick the left-hand side, due to empirical evidence that it is more >>> likely to be correct. >> >> What's left-hand side in the context of the code you changed? >> Between "what" vs "ent->util" you take "what"? That cannot be >> true. Are you referring to the (usually hidden and useless when we >> use it as an integer) "hardware capabilities" word as "left" vs the >> value of the pointer as "right"? >> >>> static uintptr_t register_symlink_changes(struct apply_state *state, >>> const char *path, >>> - uintptr_t what) >>> + size_t what) >>> { >>> struct string_list_item *ent; >>> >>> @@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state, >>> ent = string_list_insert(&state->symlink_changes, path); >>> ent->util = (void *)0; >>> } >>> - ent->util = (void *)(what | ((uintptr_t)ent->util)); >>> + ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util)); >>> return (uintptr_t)ent->util; >>> } >> >> I actually wonder if it results in code that is much easier to >> follow if we did: >> >> * Introduce an "enum apply_symlink_treatment" that has >> APPLY_SYMLINK_GOES_AWAY and APPLY_SYMLINK_IN_RESULT as its >> possible values; >> >> * Make register_symlink_changes() and check_symlink_changes() >> work with "enum apply_symlink_treatment"; >> >> * (optional) stop using string_list() to store the symlink_changes; >> use strintmap and use strintmap_set() and strintmap_get() to >> access its entries, so that the ugly implementation detail >> (i.e. "the container type we use only has a (void *) field to >> store caller-supplied data, so we cast an integer and a pointer >> back and forth") can be safely hidden. >> > Or strsets -- we only need two. > > --- >8 --- > Subject: [PATCH] apply: use strsets to track symlinks > > Symlink changes are tracked in a string_list, with the util pointer > value indicating whether a symlink is kept or removed. Using fake > pointer values requires awkward casts. Use one strset for each type of > change instead to simplify and shorten the code. > > Original-patch-by: Jessica Clarke <jrtc27@xxxxxxxxxx> > Signed-off-by: René Scharfe <l.s.r@xxxxxx> Thanks, this patch makes sense to me. Incidentally, seeing the bigger picture as a result of this patch touching everywhere that used that list, I can see that in fact the existing code would have worked, just with the compiler warning that something potentially iffy was going on. I had assumed ent->util was still sometimes storing an actual pointer, with the low bits being used as flags, as many things tend to do, but in fact it was always NULL plus a couple of flag bits, so both sides of the | always had the same bounds/permissions/tag, that of NULL (i.e. tag cleared as invalid, full bounds). This still looks like a nice cleanup though. Jess > --- > apply.c | 42 ++++++++---------------------------------- > apply.h | 26 +++++++++++--------------- > 2 files changed, 19 insertions(+), 49 deletions(-) > > diff --git a/apply.c b/apply.c > index fed195250b..7deb4f79fd 100644 > --- a/apply.c > +++ b/apply.c > @@ -103,7 +103,8 @@ int init_apply_state(struct apply_state *state, > state->linenr = 1; > string_list_init_nodup(&state->fn_table); > string_list_init_nodup(&state->limit_by_name); > - string_list_init_nodup(&state->symlink_changes); > + strset_init(&state->removed_symlinks); > + strset_init(&state->kept_symlinks); > strbuf_init(&state->root, 0); > > git_apply_config(); > @@ -117,7 +118,8 @@ int init_apply_state(struct apply_state *state, > void clear_apply_state(struct apply_state *state) > { > string_list_clear(&state->limit_by_name, 0); > - string_list_clear(&state->symlink_changes, 0); > + strset_clear(&state->removed_symlinks); > + strset_clear(&state->kept_symlinks); > strbuf_release(&state->root); > > /* &state->fn_table is cleared at the end of apply_patch() */ > @@ -3812,59 +3814,31 @@ static int check_to_create(struct apply_state *state, > return 0; > } > > -static uintptr_t register_symlink_changes(struct apply_state *state, > - const char *path, > - uintptr_t what) > -{ > - struct string_list_item *ent; > - > - ent = string_list_lookup(&state->symlink_changes, path); > - if (!ent) { > - ent = string_list_insert(&state->symlink_changes, path); > - ent->util = (void *)0; > - } > - ent->util = (void *)(what | ((uintptr_t)ent->util)); > - return (uintptr_t)ent->util; > -} > - > -static uintptr_t check_symlink_changes(struct apply_state *state, const char *path) > -{ > - struct string_list_item *ent; > - > - ent = string_list_lookup(&state->symlink_changes, path); > - if (!ent) > - return 0; > - return (uintptr_t)ent->util; > -} > - > static void prepare_symlink_changes(struct apply_state *state, struct patch *patch) > { > for ( ; patch; patch = patch->next) { > if ((patch->old_name && S_ISLNK(patch->old_mode)) && > (patch->is_rename || patch->is_delete)) > /* the symlink at patch->old_name is removed */ > - register_symlink_changes(state, patch->old_name, APPLY_SYMLINK_GOES_AWAY); > + strset_add(&state->removed_symlinks, patch->old_name); > > if (patch->new_name && S_ISLNK(patch->new_mode)) > /* the symlink at patch->new_name is created or remains */ > - register_symlink_changes(state, patch->new_name, APPLY_SYMLINK_IN_RESULT); > + strset_add(&state->kept_symlinks, patch->new_name); > } > } > > static int path_is_beyond_symlink_1(struct apply_state *state, struct strbuf *name) > { > do { > - unsigned int change; > - > while (--name->len && name->buf[name->len] != '/') > ; /* scan backwards */ > if (!name->len) > break; > name->buf[name->len] = '\0'; > - change = check_symlink_changes(state, name->buf); > - if (change & APPLY_SYMLINK_IN_RESULT) > + if (strset_contains(&state->kept_symlinks, name->buf)) > return 1; > - if (change & APPLY_SYMLINK_GOES_AWAY) > + if (strset_contains(&state->removed_symlinks, name->buf)) > /* > * This cannot be "return 0", because we may > * see a new one created at a higher level. > diff --git a/apply.h b/apply.h > index 16202da160..4052da50c0 100644 > --- a/apply.h > +++ b/apply.h > @@ -4,6 +4,7 @@ > #include "hash.h" > #include "lockfile.h" > #include "string-list.h" > +#include "strmap.h" > > struct repository; > > @@ -25,20 +26,6 @@ enum apply_verbosity { > verbosity_verbose = 1 > }; > > -/* > - * We need to keep track of how symlinks in the preimage are > - * manipulated by the patches. A patch to add a/b/c where a/b > - * is a symlink should not be allowed to affect the directory > - * the symlink points at, but if the same patch removes a/b, > - * it is perfectly fine, as the patch removes a/b to make room > - * to create a directory a/b so that a/b/c can be created. > - * > - * See also "struct string_list symlink_changes" in "struct > - * apply_state". > - */ > -#define APPLY_SYMLINK_GOES_AWAY 01 > -#define APPLY_SYMLINK_IN_RESULT 02 > - > struct apply_state { > const char *prefix; > > @@ -86,7 +73,16 @@ struct apply_state { > > /* Various "current state" */ > int linenr; /* current line number */ > - struct string_list symlink_changes; /* we have to track symlinks */ > + /* > + * We need to keep track of how symlinks in the preimage are > + * manipulated by the patches. A patch to add a/b/c where a/b > + * is a symlink should not be allowed to affect the directory > + * the symlink points at, but if the same patch removes a/b, > + * it is perfectly fine, as the patch removes a/b to make room > + * to create a directory a/b so that a/b/c can be created. > + */ > + struct strset removed_symlinks; > + struct strset kept_symlinks; > > /* > * For "diff-stat" like behaviour, we keep track of the biggest change > -- > 2.34.1