On Sat, Dec 7, 2024 at 7:21 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Christian Couder <christian.couder@xxxxxxxxx> writes: > > > +/* > > + * Trim and replace each character with ascii code below 32 or above > > + * 127 (included) using a dot '.' character. Useful for sending > > + * capabilities. > > + */ > > +void strbuf_sanitize(struct strbuf *sb); > > I am not getting "Useful for sending capabilities" here, and feel > that it is somewhat an unsubstantiated claim. If some information > is going to be transferred (which the phrase "sending capabilities" > hints), I'd expect that we try as hard as possible not to lose > information, but redact-non-ASCII is the total opposite of "not > losing information". Ok, "Useful for sending capabilities" will be removed. > By the way, as we are trimming, I am very very much tempted to > squish a run of non-ASCII bytes into one dot, perhaps like > > void redact_non_printables(struct strbuf *sb) > { > size_t dst = 0; > int skipped = 0; > > strbuf_trim(sb); > for (size_t src = 0; src < sb->len; src++) { > int ch = sb->buf[src]; > if (ch <= 32 && 127 <= ch) { > if (skipped) > continue; > ch = '.'; > } > sb->buf[dst++] = ch; > skipped = (ch == '.'); > } > } > > or even without strbuf_trim(), which would turn any leading or > trailing run of whitespaces into '.'. > > But that is an improvement that can be easily done on top after the > dust settles and better left as #leftoverbits material. Usman's patch series about introducing a "os-version" capability needs such a feature too, and Usman already reworked this code according to your comments here. It looks like you found it good too. So I will just reuse his patches related to this in the version 4 of this patch series.