On 10/2/22 04:26, Stephen Farrell
wrote:
I wonder if there's any less subjective metric that could be
applied to mailing list archives?
If you're offering to put in the footwork, the general outline of
what I know how to do would take one of two paths. Both would
start with getting as complete a copy of the email archives as
possible. This used to be easily found online, and may yet be; but
if it isn't, I'm sure the tools team could give you assistance.
Then you either:
- Take a suitably large random sample of messages over the past
37 years (work out the size of the corpus and determine what you
want your confidence interval to be), and assign a team to score
which ones they believe meet some relevant criteria (e.g.,
violate today's code of conduct). You'll want at least two
people -- and preferably more -- of differing backgrounds to
look at each message to countervail certain kinds of biases. Or
- Use one of the several available forum management tools to automatically score each message. Details vary, but most such tools will generate both "toxicity" and "sentiment" scores that you can plot over time. The ones I'm familiar with are run as a service, so you'd need to perform some light API integration (which might be as easy as piping formail into a curl command); although it's entirely possible that offline tools are also available.
Again, I know how to do this, but can't invest the resources. Let me know if you're earnest, and I'll happily consult with you on getting it to work.
/a
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call