RE: Call for Community Input: Web Analytics on www.ietf.org

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John!

> -----Original Message-----
> From: John C Klensin [mailto:john-ietf@xxxxxxx]
> Sent: Wednesday, May 22, 2019 6:33 PM
> To: Roman Danyliw <rdd@xxxxxxxx>
> Cc: Stephen Farrell <stephen.farrell@xxxxxxxxx>; Keith Moore
> <moore@xxxxxxxxxxxxxxxxxxxx>; ietf@xxxxxxxx
> Subject: RE: Call for Community Input: Web Analytics on www.ietf.org
> 
> Roman,
> 
> Your response actually raises (at least for me) some additional questions
> (saving for later issues on which you have promised details later)...
> 
> --On Wednesday, May 22, 2019 20:49 +0000 Roman Danyliw <rdd@xxxxxxxx>
> wrote:
> 
> >...
> > A few answers below.
> >
> >> -----Original Message-----
> >> From: Stephen Farrell [mailto:stephen.farrell@xxxxxxxxx]
> >> Sent: Tuesday, May 21, 2019 11:43 AM
> >> To: Roman Danyliw <rdd@xxxxxxxx>; ietf@xxxxxxxx
> >> Subject: Re: Call for Community Input: Web Analytics on www.ietf.org
> >...
> 
> >> - Do the IESG plan to evaluate the utility of this
> >>   with the possibility to ditch it if it doesn't
> >>   in fact tell us something useful? If so, when?
> >>   How will you decide if it's worth keeping?
> >
> > In the "Implementation" section the proposal notes that "[f]ollowing
> > finalization and implementation of the proposal, ...  the web
> > analytics and reports will be reviewed by the IETF Tools Team after
> > one-year to confirm they are delivering anticipated results."  The
> > IETF Tools Team will bring a recommendation to the IESG.  Whether
> > these analytics are worth keeping will be determined by whether they
> > informed site improvement (as outlined in the "Introduction" section).
> 
> I'm still not clear as to why this effort is needed at all.  I am sympathetic to
> Keith Moore's observations which I read as being about collecting
> measurements and doing statistics that are easy rather than digging down
> far enough to determine which are needed for specific purposes, and then
> figuring out how, if possible, to gather those statistics and keeping them as
> focused as possible.  Gathering data because we can, because tools are
> readily available to capture certain data, and then passing it off to the Tools
> Team to figure out what it is good for (or whether it provided "anticipated
> results" without that being specified in advance) does not strike me as a
> good way to proceed.  It also violates the most basic of privacy protection
> principles, which is that data that are not collected are data one don't have
> to worry about securing, retention times, etc.
>
> It seems to me that the first step in this process should be a clear statement
> from the Tools Team or other decision makers about how they expect to use
> the data and what data are needed for that purpose.  That expectation is
> supported by a statement in the Proposal that says "website analytics must
> be implemented to ... ● limit data being collected to that needed to serve
> specific identified purposes".  The first two items in the list of data to be
> reported are:
>   ● overall number of visitors;
>   ● views per webpage;

The expected utility of the usage data is to have an empirical basis for improving the site.  I'm not sure how we improve the web site without a feedback loop (active where comments on it are made explicitly and passively by watching usage).  We have some usage data now (https://www.ietf.org/usagedata/) but it is incomplete based on the instrumentation possible with the current architecture.  This proposal is suggesting an alternative approach that would provide equivalent usage data (as prior to the CDN migration) and also enhancements.

There are already a few questions posed in the proposal on how that data would be used:
** which pages are most commonly visited, 
** which paths visitors travel to find IETF meeting registration pages, or 
** whether introductory information, such as tutorials, leads to further exploration of website content.

> Why do we care?  If, e.g.,
> https://www.ietf.org/about/groups/iesg/members/ does not attract as much
> traffic as, e.g., https://www.ietf.org/how/meetings/upcoming/ does that
> mean we are going to take it down?  Redesign it with more animation in the
> hope of drawing additional traffic?  That is a silly example, but I trust the
> problem is clear.

I don't agree.  Stats about the visits to a page/resource can be useful for managing the site.

Imagine one looked at the usage of the meeting agenda pages (yes, I know that's in datatracker.ietf) in all of their forms -- it comes as a txt, html, iCal, etc.  Generating all of those formats takes effort. If the stats showed one is very unpopular, there might need to be a conversation about whether it's worth the effort.  Perhaps, it was noticed that the area specific drill down is quite popular.  More resources might be applied to enrich that.

Imagine one noticed that particular deeply nested content was rising in popularity (e.g., say a given blog post), this material could be promoted or referenced more prominently.

I know you meant animation comment to be as you put it a "silly example", but the notion of tuning www.ietf.org to attract more traffic as long as it positively raises awareness/visibility of our work is a good idea.  www.ietf.org is our site to the world, not just an inward facing resource to those already in the community.

> The next paragraph starts "After considering several options for
> implementing analytics,...", which sounds a lot like we have skipped over
> "why" and "what" to get to "how".
> 
> However, assuming for purposes of discussion that this is really needed for
> some useful purpose...
>
> >> - Will this new information be shared with anyone
> >>   else (e.g. ISOC as allowed for in [2]).
> >
> > The proposal outlines that the "IETF Secretariat,  communications
> >staff, and the IESG"
> >...
> > I'll have to follow-up on the additional users (ISOC) implied  by [2].
> 
> I note that the Tools Team, who are explicitly called out as getting the data,
> are, except for individual coincidences, not part of the IETF Secretariat, the
> communications staff (I think I know who/what that means, but am not
> sure), or the IESG, so the list of parties with whom information is shared is a
> superset of that list even before ISOC staff is considered.  It is also interesting
> that neither the IETF LLC Exec Director nor the IETF LLC Board are on the list
> of people to be given access to the data, something that would probably
> make it hard to evaluate the results and utility of this work.  Given that this
> sort of thing isn't free even if we (volunteers or the
> Secretariat) maintain the software on our own equipment, I'd hope that sort
> of evaluation would be part of any ongoing effort.
> 
> >> - Does this constitute tracking behaviour? The
> >>   current privacy policy [2] says we don't do that.
> >
> > My read is no.
> 
> reporting
> ● traffic sources; and
> ● aggregated visitor profiles (including OS, browser, and primary languages)
> ● visitors' paths through the site (including time spent on webpages, as well
> as entry and exit pages
> 
> Certainly sounds like tracking behavior to me.   I'm interested
> in why you don't read it that way because we may have different definitions.

I agree that tracking can have a variety of definitions.  I'm only say it isn't tracking behavior as defined by [3].  More thoughts below.

> > [3] says that "tracking is the collection of data regarding a
> > particular user's activity across multiple distinct contexts and the
> > retention, use, or sharing of data derived from that activity outside
> > the context in which it occurred. A context is a set of resources that
> > are controlled by the same party or jointly controlled by a set of
> > parties."
> 
> By this definition, one can do just about anything one likes to capture
> information about user behavior as long as that user doesn't leave "a
> context" and then defining "context" in an appropriately broad way.  In
> particular...

I agree there is grey here on what is "jointly controlled".  This is the precision of the spec.

>> *.ietf.org servers are single context controlled by the same party
> > (IETF).  The proposed implementation plan is a self-hosted solution
> > which does indeed collect activity data but NOT across "multiple,
> > distinct contexts".
> 
> Really?  First, whether *.ietf.org servers are "controlled by the same party" is
> questionable.  One could suggest that they are "controlled by" some
> combination of the IETF LCC, AMS, the Tool Team, and maybe some "cloud"
> or "CDN" suppliers.  Perhaps at least some of those relationships are tightly
> enough specified contractually to make the control by the IETF LLC (not the
> IESG) clear, but "the IETF" (as a community of participants) doesn't know
> enough about the details of those contracts to be confident about that.
> And, unless the Tools Team started being subject to Close Technical
> Supervision while I wasn't looking, pages in ietf.org subdomains they control
> and manage cannot be said to be "controlled by" the IETF.

I'm not entirely tracking on distinction being made with "controlled".  I cribbed the use of "IETF" as the controlling party because the privacy policy only names IETF, IAB and IRTF.  If the distinction you're making is that the IETF, as in the broader community of participants, doesn't control the website sure I concede that.  However, I disagree that www.ietf.org isn't a either a single (or jointly) controlled context per [3] simply because contractors, detailees, volunteers, LLC individuals might be involved in its operation.

> IMO, lots of loose ends here.  

No disagreement.  That's why the proposal was put out for comment.   A number of implementation details remain.

Regards,
Roman

> From my point of view, the most important is
> why we actually need to do this, what we hope to accomplish, and, from
> those things, what data we actually need.
> The "information to be collected" part of the proposal is not especially
> helpful in this regard.  Some of the comments do help in imagining what is
> intended and why, but I don't think we should need to imagine.
> 
> best,
>    john





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux