Re: tying files to git repository

Philip Oakley <philipoakley@iee.email> · Tue, 26 Nov 2019 15:28:25 +0000

Hi Jim

This 
(https://public-inbox.org/git/CAPuR+ZhwnHCp8j76PscuBqG2rCLkgG0+6Y3WwLgNRhaoj4OR9A@xxxxxxxxxxxxxx/) 
has been lying around for a long while..

On 11/11/2019 15:28, Jim Edwards wrote:
Hi,

I am a developer of scientific software, as science software we expect
and encourage users to modify source code in order to customize their
experiments.
Sounds an ideal usage for git - giving back control to the users so they 
can easily maintain their versions.
The mechanism which has been developed to do this
predates having source code in git and we are trying to figure out a
way to minimize changes to the scientists workflow, while leveraging
the power of git to improve the process.
Good, though sounds like you will need some terminology mapping.. Is 
there a public reference to the method?
In the workflow the
scientist creates a 'case' using script in the repository to create a
directory structure from which they will conduct their experiment.
In what sense do you use "case"?
 e.g. a [single] suitcase that holds a choice of clothes to wear / 
experimental methods;
Or, a long list of experimental setups, each with a name, selecting a 
'case' statement (like a software 'case' statement);
Or, a use-case that gives a half complete suggestion about how it may 
work, but with some details still to be filled in due to lack of space 
on the post-it note..?

Given that the scientists create a 'directory structure' (containing 
files?) for each experiment, this sounds very much like creating a 
'branch' (line of development) from the initial template of that 
structure, and as they develop their experiment's directory structure, 
they record their development in 'commits' on that branch (and sub 
branches if they are looking at alternatives). Finally, when they have a 
good structure ready, they can 'tag' that commit so it's easy to find.

Part of that directory structure is a SourceMods directory where the
user can drop modified source files that will be compiled in place of
a file of the same name in the source tree.
This 'dropping' has a strong _conceptual_ similarity to the staging area 
or 'index', where git users 'add' files that they feel are ready to an 
area that is used (like an outbox awaiting collection) as a temporary 
holding area waiting till all the bits are ready and waiting before they 
commit the ensemble.

So this "SourceMods" is very similar to the staging area, except that in 
your case it sounds like it is a specific place, while in git it is more 
conceptual as the user will 'git add <file>', and that change is 
registered in "the index" (a local file in a hidden .git directory), 
ready for the big commit.

Behind the scenes, a copy of the file is saved (in the object store) and 
hashed ready for inclusion in the commit hierarchy . Later the files 
(objects) stored in the object store are 'packed' resulting in a very 
compact storage, particularly for source files. The git repository can 
be 'pushed' to other servers, and other repositories 'fetched' from 
servers (and mixed together if they have common ancestry).

These files are sometimes
long lived and passed from case to case and even user to user and it
is not hard to have the files get out of sync with the source tree.
In Git, because the current files stay 'in place', you can start a 
branch (new experiment definition) from anywhere in any line of 
development. You/they simply checkout that particular commit and, voila, 
all the files are back as they were exactly.
We have discussed at length removing the SourceMods capability and
requiring scientists to create branches in git, but there is a lot of
resistance to this in the community.
Most of that will be fear of the unknown and the unfamiliarity of the 
git terminology. There can also be confusion about how Git has changed 
the old ways of working. Because you get 100% verification and 
validation you no longer need to worry about requiring a central golden 
reference store (though usually the "organisation" will want to have a 
_copy_ ;-)
What I would like to explore is
allowing scientists to keep the method that they are used to but at
the same time tying these modified files to their history in git.
The key part will be in how you map what they already do and know to the 
git commands and structure, and how you show them that it will remove a 
lot of the pain points and bottle necks.
Is
there a way to get the git metadata associated with an individual file
so that we can treat that file as if it were in the repo?

That [mental model view] way is a Sisyphean task, a never ending up-hill 
struggle. With a few careful words (mapping out solutions to their pain 
points) you should be able to get the scientists to pester you to 
implement git sooner rather than later. Choose the lead experimenters of 
git carefully.

--
Philip