Mauro Santos via arch-general <arch-general@xxxxxxxxxxxxx> writes: > On 05-03-2017 13:35, Lukas Fleischer wrote: >> Hi, >> >> I was recently contacted by a Polish researcher asking for a list of AUR >> account names. I did not expect this to be controversial but a couple of >> Trusted Users raised concerns on IRC, so I decided to move this to the >> public mailing list and discuss the whole topic in generality. I would >> like to head more opinions but please read the whole email and give it a >> second thought before simply bringing up the usual privacy arguments >> mentioned below. >> >> My original questions was: Are we fine with sharing the list of AUR >> accounts names (only user names, no real names or email addresses) with >> a researcher that seems trustworthy and agrees to not share the data in >> any form other than the resulting anonymized statistics? >> >> In this particular case, we are talking about Dorota Celinska [1] from >> the University of Warsaw, Faculty of Economic Sciences [2], see [3] for >> a list of her publications and [4] for a summary of her research project >> funded recently by the Polish National Science Centre. She needs the >> list of user names to perform a segmentation analysis, including users >> which were active on the older AUR releases both do not show any >> activity on AUR 4. She would also like to use the user names as >> identifiers to establish connections with other platforms, such as >> GitHub. >> >> The next question is: Would it make sense to even make this data >> publicly available? Would it make sense to extend our RPC interface such >> that one can search for users names? GitHub, for example, already >> provides such an interface [5]. Let me quickly summarize some arguments >> for this idea which came up on IRC: >> >> * User names are mostly identifiers. It is questionable whether they >> can/should be considered personal/private information. Maybe this can >> only be answered by a lawyer, though. >> >> * The user names of all accounts with any kind of public activity, like >> uploading a package, filing a request, writing a comment, are public >> already. >> >> * After logging into the aurweb interface, you can already check whether >> an account with a given user name exists because the account details >> page URIs have the form https://aur.archlinux.org/account/$username. >> This means that for any platform providing a list of user names (such >> as GitHub), you can "establish connections" with the AUR already. >> >> Now the arguments against: >> >> * Principle of data economy: We should not share any kind of information >> we do not need to share. >> >> * Sharing user names lowers the threshold for sharing other information >> which is considered more confidential. >> >> * Users can (and should) already use crawlers to fetch the user names. >> For example, the user names of all package maintainers and comment >> authors appear on the package details pages. The names of all users >> filing package requests appear in the mailing list archives etc. >> >> * We do not have ToS so we better not share anything. >> >> I, personally, find the second last argument a very weak one. Telling >> users to build crawlers scraping an brute-forcing our HTML pages makes >> life difficult for both them and us. What do you think? >> >> On the other side of the coin, the last argument is a very good one and >> it brings me to my last point. Independently of the outcome of this >> discussion, I think we should add some ToS that users need to agree upon >> when registering. It should contain information on liability and on >> privacy. Is anybody willing to write a draft? Do we need the support of >> a lawyer here? >> >> Thank you for your time and have a nice Sunday! >> >> Regards, >> Lukas >> >> [1] http://coin.wne.uw.edu.pl/dcelinska/en/ >> [2] https://www.wne.uw.edu.pl/index.php/en/ >> [3] http://coin.wne.uw.edu.pl/dcelinska/en/pages/publications.html >> [4] https://ncn.gov.pl/sites/default/files/listy-rankingowe/2016-03-15/streszczenia/337724-en.pdf >> [5] https://developer.github.com/v3/users/ >> > > I'd say err on the caution side and don't share, even though the > usernames are public and easy to find by scraping them from the > website/mailing list/etc, handing the whole database of usernames in a > silver platter is a whole different story, which is what is being asked. > Is there any community/website that provides a full list of registered > usernames on request? > > There is also the question of how useful that data would be, without any > other data such as email the username list is useless, you have no > guarantee that user foo on github is the same person as user foo on the > AUR/Wiki/Forum or user foo somewhere else. In this case I'd also have to > agree that sharing usernames lowers the threshold for sharing other > information. > > It also doesn't fit with their stated research goals, only github and > projects associated with scraping data from github are mentioned, why > would they want to throw the AUR usernames in the mix? > > -- > Mauro Santos Hi all, Shall we focus on Lukas's questions? >> My original questions was: Are we fine with sharing the list of AUR >> accounts names (only user names, no real names or email addresses) with >> a researcher that seems trustworthy and agrees to not share the data in >> any form other than the resulting anonymized statistics? → The first question: Are we fine with sharing the user names? >> The next question is: Would it make sense to even make this data >> publicly available? Would it make sense to extend our RPC interface such >> that one can search for users names? GitHub, for example, already >> provides such an interface [5]. Let me quickly summarize some arguments >> for this idea which came up on IRC: → The second question: Would it make sense to even make this data publicly available? >> I think we should add some ToS that users need to agree upon >> when registering. It should contain information on liability and on >> privacy. Is anybody willing to write a draft? Do we need the support of >> a lawyer here? → The third question: Shall we add some ToS that users need to agree upon when registering? My opinions: 1. The first question: Are we fine with sharing the user names? I am fine. But I think some agreements should be made before sharing the data. 2. The second question: Would it make sense to even make this data publicly available? No, it is not OK. Please check this wiki [1]. Login name or nickname is Personally identifiable information (PII). 3. The third question: Shall we add some ToS that users need to agree upon when registering? Yes, it is better to have ToS. [1]: https://www.wikiwand.com/en/Personally_identifiable_information