Re: Simple analytics for docs.kernel.org and patchwork, please?

Jakub Kicinski <kuba@xxxxxxxxxx> · Fri, 23 Feb 2024 12:02:00 -0800

On Fri, 23 Feb 2024 10:49:35 -0700 Jonathan Corbet wrote:
> Jakub Kicinski <kuba@xxxxxxxxxx> writes:
> 
> > Does anyone think that even non-intrusive analytics are a no go?  
> 
> What sorts of analytics are you looking for?  Simple logfile analysis
> should be fairly uncontroversial and would tell you which documents are
> most of interest to the AI bots^W^Wdevelopers. 

Yes, basic analysis of access.log would do. I think that's equivalent
to what Plausible does. More of a question of what existing solution we
can set up quickly, but have no preference on which method or tool we
end up using.

All we need is hit count for a subpage, with some basic dedup of
a single reader hitting refresh...

> Anything requiring, say, javascript in the browser is likely to get
> blocked by the kinds of people who might be interested in kernel docs.

Interesting. I spent 20min grepping the netdev's access.log.
This may be confirmation bias, but vast majority of the hits
are more or less thinly veiled bots. Unless we believe that
someone from an Android phone decided to visit "admin.php"
after landing on our page... (admin.php obviously doesn't exit)

I zeroed in on the following metric - users who came from patchwork
(clicked on CI results) over the last week. Plausible -> 17,
IP addresses in access log with the right refer -> 18.
The dates in logs may not match up exactly so the small delta is
expected.

After doing this exercise, I'd like to withdraw my previous statement
that "access.log analysis" is fine. Now I think it's far more likely
we'd miscount bots than that someone legit has blocked javascript...

> We did an overview of relatively innocuous analytics packages a few
> years ago:
> 
>   https://lwn.net/Articles/822568/

We need some analysis of how much of an email people actually read :)
Look at the second paragraph of my first email, where do you think 
I found Plausible if not LWN ;)