Konstantin Ryabitsev <mricon@xxxxxxxxxx> wrote: > On Sat, Apr 27, 2024 at 07:19:21AM GMT, Eric Wong wrote: > > Correct, public-inbox currently won't index every header due to > > cost, false positives, and otherwise lack of usefulness (general > > gibberish from DKIM sigs, various UUIDs, etc). > > > > So it doesn't currently know about "X-stable:" > > > > I started working on making headers indexing configurable last > > year, but didn't hear a response from the person that > > potentially was interested: > > > > https://public-inbox.org/meta/20231120032132.M610564@dcvr/ > > > > Right now, indexing new headers + validations can be maintained > > as a Perl module in the public-inbox codebase. > > > > For lore, it'd make sense to be able to configure a bunch (or > > all) inboxes at once instead of the per-inbox configuration in > > my proposed RFC. > > > > At minimum, one would have to know: > > > > 1) the mail header name (e.g. `X-stable') > > 2) the search prefix to use (e.g. `xstable:') # can't use dash `-' AFAIK > > 3) the type of header value (phrase, string, sortable numeric, etc...) > > I'm whole-heartedly for this! This ties nicely to my b4 work where I'd > like to be able to identify code-review trailers sent for a specific > patch, even if that patch itself is not on lore. For example, this could > be a patch that is part of a pull-request on a git forge, but we'd still > like to be able to collect and find code-review trailers for it when a > maintainer applies it. OK, a more configurable version is available on a per-inbox basis: https://public-inbox.org/meta/20240508110957.3108196-1-e@xxxxxxxxx/ But that's a PITA to configure with hundreds of inboxes and doesn't have extindex support, yet. I made it share logic with the old altid code; so I'll also be getting altid into extindex since ISTR users wanting to be able to lookup gmane stuff via extindex. And it also works with the new C++ xap_helper process (which I'll use for threadid: support (still working on that...)). > I'm perfectly fine with it only being a string, honestly. Yeah, though there's 3 ways of indexing strings, currently :x I've decided to keep some options open and support boolean_term, text, and phrase for now. boolean_term is the cheapest and probably best for exactly matching labels/enums and such. The others may work better for more complex texts (comma-delimited labels, maybe). > > So probably just supporting strings and/or phrases to start... > > > > Validation to prevent poisoning by malicious/broken senders can > > be useful in some cases (and the reason the RFC was a per use > > case Perl module). That said, I'm not sure if much validation > > is necessary for X-stable: headers or if just any text is fine. > > I'd let the consumer clients worry about it. Agreed.