Re: Toy/demo: using ChatGPT to summarize lengthy LKML threads (b4 integration)

Willy Tarreau <w@xxxxxx> · Wed, 28 Feb 2024 06:00:07 +0100

Hi Konstantin,

On Tue, Feb 27, 2024 at 05:32:34PM -0500, Konstantin Ryabitsev wrote:
> Hi, all:
> 
> I was playing with shell-gpt and wrote a quickie integration that would allow
> retrieving (slimmed-down) threads from lore, feeding them to ChatGPT, and
> asking it to provide some basic analysis of the thread contents. Here's a
> recorded demo session:
> 
> https://asciinema.org/a/643435
> 
> A few notes:
> 
> 1. This is obviously not a replacement for actually reading email, but can
>    potentially be a useful asset for a busy maintainer who just wants a quick
>    summary of a lengthy thread before they look at it in detail.
> 2. This is not free or cheap! To digest a lengthy thread, you can expect
>    ChatGPT to generate enough tokens to cost you $1 or more in API usage fees.
>    I know it's nothing compared to how expensive some of y'all's time is, and
>    you can probably easily get that expensed by your employers, but for many
>    others it's a pretty expensive toy. I managed to make it a bit cheaper by
>    doing some surgery on the threads before feeding them to chatgpt (like
>    removing most of the message headers and throwing out some of the quoted
>    content), but there's a limit to how much we can throw out before the
>    analysis becomes dramatically less useful.
> 3. This only works with ChatGPT-4, as most threads are too long for
>    ChatGPT-3.5 to even process.
> 
> So, the question is -- is this useful at all? Am I wasting time poking in this
> direction, or is this something that would be of benefit to any of you? If the
> latter, I will document how to set this up and commit the thread minimization
> code I hacked together to make it cheaper.

Amusing, I've run experiments about something comparable with my own
e-mails (I'd like to get a few lines summary before reading them), and
thought about being able to summarize long LKML threads to still know
what is currently going on without having to spend a lot of time on
all of them.

I figured a number of shortcomings about this: I suspect that those
most interested in such output are either, a bit like me, not much
active on kernel development, or focus on a specific area and mostly
want to stay aware of ongoing changes in other areas they're really
not familiar with.

And because of this I didn't find on what boundaries to cut the analysis,
If it's "since last time I read my email", it can only be done locally
and will be per-user. If it's a summary of a final thread, it's not
super interesting and it's better explained (IMHO) on LWN where the
hot topics are summarized and developed. If it's the list of threads
of the day, I've suspected that there are so many that it's unlikely
I'd read all of them every evening or every morning. I've been wondering
if an interesting approach would be to only summarize long threads,
since most short ones are a patch, a review and an ACK and do not need
to be summarized, but I think that most of us seeing a subject repeat
over many e-mails will just look at a few exchanges there to get an
idea of what's going on.

Ideally having a link in each thread to a place where a summary is
being held could be nice, except that it's not how such tools work.
You certainly don't want to re-run the analysis on a whole thread
every time it grows by a few messages due to processing time and
cost.

Also regarding processing costs, I've had extremely good results using
the Mixtral-8x7B LLM in instruct mode running locally. It has a 32k context
like GPT4. And if not enough, given that most of a long thread's contents
is in fact quoted text, it could be sufficient to drop multiple indents
to preserve a response and its context while dropping most of the repeat
(it cuts your example thread in roughly half). But this still takes quite
a bit of processing time: processing the 14 mails from the thread above
took 13 minutes on a 80-core Ampere Altra system (no GPU involved here).
This roughly costs 1 minute per e-mail, that's a lot per day, not counting
the time needed to tune the prompt to get the best results!

Overall, while I think that some people might find "something like this"
useful, most of them would want it "slightly different" to be useful to
them.

Just my two cents,
Willy