Re: GDPR compliance best practices?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 03 2018, Peter Backes wrote:

> On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I'm not trying to be selfish, I'm just trying to counter your literal
>> reading of the law with a comment of "it'll depend".
>>
>> Just like there's a law against public urination in many places, but
>> this is applied very differently to someone taking a piss in front of
>> parliament v.s. someone taking a piss in the forest on a hike, even
>> though the law itself usually makes no distinction about the two.
>
> We have huge companies using git now. This is not the tool used by a
> few kernel hackers anymore.

Sure, but what I'm pointing out is a) you can't focus on git as the
technology because it tells you nothing about what's being done with it
(e.g. the log file case I mentioned b) nobody who came up with the GDPR
was concerned with some free software projects or the SCM used by
companies, so this is very unlikely to be enforced.

>> In this example once you'd delete the UUID ref you don't have the UUID
>> -> author mapping anymore (and b.t.w. that could be a many to one
>> mapping).
>
> It is not relevant whether you have that mapping or not, it is enough
> that with additional information you could obtain it. For example, say,
> you have 5000 commits with the same UUID. Now your delete the mapping.
> But your friend still has it on his local copy. Now your friendly
> merely needs to tell you who is behind that UUID and instantly you can
> associate all 5000 commits with that person again.

So nobody can be GDPR compliant in the face of archive.org and the like?
If the law says that you need to delete information you published in the
past, and you do so, how is it your problem that someone mirrored &
re-published it? That's their compliance problem at that point.

> The GDPR is very explict about this, see recital 26. It says that
> pseudonymization is not enough, you need anonymization if you want to
> be free from regulation.
>
> In addition, and in contrast to my proposal, your solution doesn't
> allow verification of the author field.

It does if you've got the ref. Maybe I just don't get your proposal,
quote:

    Do not hash anything directly to obtain the commit ID. Instead, hash a
    list of hashes of [$random_number, $information] pairs. $information
    could be an author id, a commit date, a comment, or anything else. Then
    store the commit id, the list of hashes, and the list of pairs to form
    the commit.

You're just proposing (if I've read this correctly) that the commit
object should have some list of headers pointing to other SHA1s, and
that fsck and the like be OK with these going away. Right?

How is this intrinsically different from referring to something in the
ref namespace that may be deleted in the future?

In both cases you're just trying to solve the problem of trying to
somehow encode data into a git repository today, that may go away
tomorrow. Similar to how a reference to some LFS object today going away
doesn't fail "git fsck".

>> I think again that this is taking too much of a literalist view. The
>> intent of that policy is to ensure that companies like Google can't just
>> close down their EU offices weasel out of compliance be saying "we're
>> just doing business from the US, it doesn't apply to us".
>>
>> It will not be used against anyone who's taking every reasonable
>> precaution from doing business with EU customers.
>
> I think you are underestimating the political intention behind the
> GDPR. It has kind of an imperialist goal, to set international
> standards, to enforce them against foreign companies and to pressure
> other nations to establish the same standards.
>
> If I would read the GPDR in a literal sense, I would in fact come to
> the same conclusion as you: It's about companies doing substantial
> business in the EU. But the GDPR is carefully constructed in such a way
> that it is hard not to be affected by the GDPR in one way or another,
> and the obvious way to cope with that risk is to more or less obey the
> GDPR rules even if one does not have substantial business interests in
> the EU.

Okey, so you're not reading the GDPR in some literal sense, but you're
coming to a conclusion that's supported by ... what? To echo Theodore
Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on
this subject?

I haven't but, I'm not suggesting that the git data format needs to
change because of some new EU law. You are, what's your basis for that
opinion?

It seems to me that the git project doesn't need to do anything about
this. There's plenty of things that are illegal to publish, and some of
which may be made illegal after the fact (e.g. national security related
information). If those things are incidentally saved in git repositories
the parties involved may need to run git-filter-branch.

Of course if they need to do that on a weekly basis because of some
overzealous law we may need to have some "native" support for that, but
I see zero signs of that so far.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux