John Palmieri wrote: > ----- "Toshio Kuratomi" <a.badger@xxxxxxxxx> wrote: > > <snip> >> Getting koji data munged and transferred may be a problem as it is >> just >> so darn big. If we don't have to make changes to the data in koji, >> just >> get it distributed, then we could give access to a backup... but >> that's >> still a lot of information to transfer. > > We would only need a portion of the data. Ideally everything since the last supported version of each distribution (or one after so we get obsolete data to test against) but in reality the last month of activity should be suitable. > This gets us into the realm of figuring out what we can delete from the entire koji data store which seems like a big can of worms. Some things like usernames have to be in their entirety. Other things like builds can be less than the entirety but since there's dependencies between builds it wouldn't be a simple remove everything before this timestamp. It gets us back into munging the koji data which is what I think we should be avoiding. >> pkgdb, fas, and bodhi are relatively small. >> >> fas is where we'd have our major security problems. We can't give >> the >> information out unmunged. I've munged it before, though, so it's >> doable. How strict we need to be is an issue, though. If we remove >> all >> the identifying information in the people table except for the >> userid, >> is that sufficient? *Note: We probably also need to munge data in >> the >> configs table. > > As long as we randomly generate data for that (well username at least). Note that UID's are easily mapped back to usernames so you might want randomize that. Also I believe packagedb and bodhi use usernames as the key instead of UID's so those would have to match accounts in the munged FAS db. I would suggest generating a list of names from a dictionary and using that list to randomize names in the other services. Of course the names need to correspond to group permissions so some logic would be needed to make sure records associated with a give name are valid. However having the ability to recreate the associated user names may not be an issue since all of that data is public. More importantly we need to make sure we aren't giving out addresses, phone numbers, password hashes and other such keys. > pkgdb uses userids in the db. Bodhi and koji use usernames. I'm migrating pkgdb to usernames (internally right now; the db and public facing APIs for 0.4) If we have to munge usernames that makes things harder as we can't just dump the koji and bodhi dbs but also have to post-process them. (Note: usernames are another thing that the privacy policy allows us to give out.) >> pkgdb and bodhi don't have information that is privacy policy >> sensitive. >> (Which doesn't mean that some users won't like it... just that I >> think >> we're covered.) > > Mike's suggestion of running it by legal sounds like the best route. > Running it by legal just to be sure we're doing the right thing is good although we do have a list of things that we are allowed to have public per the privacy policy and a pretty good criteria for deciding on other data. I'm commenting more on the perception aspect rather than the pure legal obligation. And not saying I think it's going to be a problem just that we should be prepared for a few complaints even if it's perfectly legal. -Toshio
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list