RE: Only track built files for final output?

"Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> · Tue, 20 Aug 2019 14:11:59 -0400

On August 20, 2019 1:47 PM, Pratyush Yadav
> On 20/08/19 08:21AM, Leam Hall wrote:
> > Hey all, a newbie could use some help.
> >
> > We have some code that generates data files, and as a part of our
> > build process those files are rebuilt to ensure things work. This
> > causes an issue with branches and merging, as the data files change
> > slightly and dealing with half a dozen merge conflicts, for files that
> > are in an interim state, is frustrating. The catch is that when the
> > code goes to the production state, those files must be in place and
current.
> >
> > We use a release branch, and then fork off that for each issue.
> > Testing, and file creation, is a part of the pre-merge process. This
> > is what causes the merge conflicts.
> >
> > Right now my thought is to put the "final" versions of the files in
> > some other directory, and put the interim file storage directory in
> .gitignore.
> > Is there a better way to do this?
> >
> 
> My philosophy with Git is to only track files that I need to generate the
final
> product. I never track the generated files, because I can always get to
them
> via the tracked "source" files.
> 
> So for example, I was working on a simple parser in Flex and Bison. Flex
and
> Bison take source files in their syntax, and generate a C file each that
is then
> compiled and linked to get to the final binary. So instead of tracking the
> generated C files, I only tracked the source Flex and Bison files. My
build
> system can always get me the generated files.
> 
> So in your case, what's wrong with just tracking the source files needed
to
> generate the other files, and then when you want a release binary, just
clone
> the repo, run your build system, and get the generated files?
> What benefit do you get by tracking the generated files?

The benefit of putting final release packages into git is based on the
following set of requirements in highly regulated industries:

1. The release artifacts can never change from the point in time at which
they are certified as working (a.k.a. passed tests) to the point when they
are replaced with other artifacts (a subsequent release). Recompiling is not
sufficient as the compilers themselves may change or be compromised. This is
an audit requirement.
2. The source commit(s) used to create the release artifacts must be
immutable so that the origins of the release artifacts are always known.
This is also an audit requirement in regulated industries.
3. Disconnecting the source from the object (as is common in artifact
repositories) breaks #2 and allows malicious code injection in
after-the-test code reproduction. Variant of #2 but from the security
perspective.
4. Metadata on the origin of the release artifacts (the clone URL, the
parent commit, the branch, signed commits), are required for forensic
analysis of code in a compliance environment.

There are other related variants of the above, but those are the essential
ones that are generally accepted in financial, insurance, medical device,
and industrial applications. Increasingly, food production and distribution
sectors are realizing that they are also subject to the above. I sadly
cannot cite specific internal regulations or policies for NDA reasons, but
hope that others are able to do that.

Regards,
Randall

-- Brief whoami:
 NonStop developer since approximately 211288444200000000
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.