Re: [Last-Call] OT: change BCP 83 [Re: Last Call: BCP 83 PR-Action Against Dan Harkins]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Adam,

On 2 Oct 2022, at 20:42, Adam Roach wrote:

On 10/2/22 04:26, Stephen Farrell wrote:

I wonder if there's any less subjective metric that could be
applied to mailing list archives?

If you're offering to put in the footwork, the general outline of what I know how to do would take one of two paths. Both would start with getting as complete a copy of the email archives as possible. This used to be easily found online, and may yet be; but if it isn't, I'm sure the tools team could give you assistance.

The IETF sill maintains IMAP access to the mail archive for lists hosted on ietf.org, so access to the emails is straightforward.

Then you either:

* Take a suitably large random sample of messages over the past 37
years (work out the size of the corpus and determine what you want
your confidence interval to be), and assign a team to score which
ones they believe meet some relevant criteria (e.g., violate today's
code of conduct). You'll want at least two people -- and preferably
more -- of differing backgrounds to look at each message to
countervail certain kinds of biases. Or

This would be time-consuming and expensive, but would likely give an interesting result.

* Use one of the several available forum management tools to
automatically score each message. Details vary, but most such tools
will generate both "toxicity" and "sentiment" scores that you can
plot over time. The ones I'm familiar with are run as a service, so
you'd need to perform some light API integration (which might be as
easy as piping formail into a curl command); although it's entirely
possible that offline tools are also available.

Again, I know how to do this, but can't invest the resources. Let me know if you're earnest, and I'll happily consult with you on getting it to work.

I’m part of a project that’s doing mailing list analysis of IETF data. and the recent IAB AID workshop also explored this topic.

We haven’t spent too much time looking at sentiment analysis, but my colleagues took a quick look at messages on the ietf@xxxxxxxx list.

The plots below show the average extent, expressed in the range 0…1, to which text in emails sent to that list in each year rate as positive, negative, or neutral sentiment, according to the VADER Sentiment Analysis library:

Redrawing the plot with a different range, to focus on the positive and negative sentiment categories, it’s clear that messages labelled as positive sentiment outweigh those labelled as negative, but there’s a significant fraction of negativity. Proportions don’t look to be changing significantly over time.

Sentiment analysis, of course, is a crude measure that doesn’t necessarily correlate with toxicity. It’d be interesting to analyse further, and look at the other mailing lists too.

If anyone’s interested in exploring the data further, our project will be at the hackathon at the London IETF in a few weeks - come talk to us.

Colin
(with thanks to Mladen Karan and Ravi Shekhar, cc’d, for wrangling the data)

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux