Hi Jim
This
(https://public-inbox.org/git/CAPuR+ZhwnHCp8j76PscuBqG2rCLkgG0+6Y3WwLgNRhaoj4OR9A@xxxxxxxxxxxxxx/)
has been lying around for a long while..
On 11/11/2019 15:28, Jim Edwards wrote:
Hi,
I am a developer of scientific software, as science software we expect
and encourage users to modify source code in order to customize their
experiments.
Sounds an ideal usage for git - giving back control to the users so they
can easily maintain their versions.
The mechanism which has been developed to do this
predates having source code in git and we are trying to figure out a
way to minimize changes to the scientists workflow, while leveraging
the power of git to improve the process.
Good, though sounds like you will need some terminology mapping.. Is
there a public reference to the method?
In the workflow the
scientist creates a 'case' using script in the repository to create a
directory structure from which they will conduct their experiment.
In what sense do you use "case"?
e.g. a [single] suitcase that holds a choice of clothes to wear /
experimental methods;
Or, a long list of experimental setups, each with a name, selecting a
'case' statement (like a software 'case' statement);
Or, a use-case that gives a half complete suggestion about how it may
work, but with some details still to be filled in due to lack of space
on the post-it note..?
Given that the scientists create a 'directory structure' (containing
files?) for each experiment, this sounds very much like creating a
'branch' (line of development) from the initial template of that
structure, and as they develop their experiment's directory structure,
they record their development in 'commits' on that branch (and sub
branches if they are looking at alternatives). Finally, when they have a
good structure ready, they can 'tag' that commit so it's easy to find.
Part of that directory structure is a SourceMods directory where the
user can drop modified source files that will be compiled in place of
a file of the same name in the source tree.
This 'dropping' has a strong _conceptual_ similarity to the staging area
or 'index', where git users 'add' files that they feel are ready to an
area that is used (like an outbox awaiting collection) as a temporary
holding area waiting till all the bits are ready and waiting before they
commit the ensemble.
So this "SourceMods" is very similar to the staging area, except that in
your case it sounds like it is a specific place, while in git it is more
conceptual as the user will 'git add <file>', and that change is
registered in "the index" (a local file in a hidden .git directory),
ready for the big commit.
Behind the scenes, a copy of the file is saved (in the object store) and
hashed ready for inclusion in the commit hierarchy . Later the files
(objects) stored in the object store are 'packed' resulting in a very
compact storage, particularly for source files. The git repository can
be 'pushed' to other servers, and other repositories 'fetched' from
servers (and mixed together if they have common ancestry).
These files are sometimes
long lived and passed from case to case and even user to user and it
is not hard to have the files get out of sync with the source tree.
In Git, because the current files stay 'in place', you can start a
branch (new experiment definition) from anywhere in any line of
development. You/they simply checkout that particular commit and, voila,
all the files are back as they were exactly.
We have discussed at length removing the SourceMods capability and
requiring scientists to create branches in git, but there is a lot of
resistance to this in the community.
Most of that will be fear of the unknown and the unfamiliarity of the
git terminology. There can also be confusion about how Git has changed
the old ways of working. Because you get 100% verification and
validation you no longer need to worry about requiring a central golden
reference store (though usually the "organisation" will want to have a
_copy_ ;-)
What I would like to explore is
allowing scientists to keep the method that they are used to but at
the same time tying these modified files to their history in git.
The key part will be in how you map what they already do and know to the
git commands and structure, and how you show them that it will remove a
lot of the pain points and bottle necks.
Is
there a way to get the git metadata associated with an individual file
so that we can treat that file as if it were in the repo?
That [mental model view] way is a Sisyphean task, a never ending up-hill
struggle. With a few careful words (mapping out solutions to their pain
points) you should be able to get the scientists to pester you to
implement git sooner rather than later. Choose the lead experimenters of
git carefully.
--
Philip