Re: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 22 Sep 2021 22:01:36 +0200

On Wed, Sep 22 2021, Neeraj Singh wrote:

> On Tue, Sep 21, 2021 at 7:02 PM Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
>>
>>
>> On Tue, Sep 21 2021, Neeraj Singh wrote:
>>
>> > On Tue, Sep 21, 2021 at 4:58 PM Ævar Arnfjörð Bjarmason
>> > <avarab@xxxxxxxxx> wrote:
>> >>
>> >>
>> >> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>> >>
>> >> > From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
>> >> >
>> >> > Add test cases to exercise batch mode for 'git add'
>> >> > and 'git stash'. These tests ensure that the added
>> >> > data winds up in the object database.
>> >> >
>> >> > I verified the tests by introducing an incorrect rename
>> >> > in do_sync_and_rename.
>> >> >
>> >> > Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
>> >> > ---
>> >> >  t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
>> >> >  t/t3700-add.sh        | 11 +++++++++++
>> >> >  t/t3903-stash.sh      | 14 ++++++++++++++
>> >> >  3 files changed, 59 insertions(+)
>> >> >  create mode 100644 t/lib-unique-files.sh
>> >> >
>> >> > diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
>> >> > new file mode 100644
>> >> > index 00000000000..a8a25eba61d
>> >> > --- /dev/null
>> >> > +++ b/t/lib-unique-files.sh
>> >> > @@ -0,0 +1,34 @@
>> >> > +# Helper to create files with unique contents
>> >> > +
>> >> > +test_create_unique_files_base__=$(date -u)
>> >> > +test_create_unique_files_counter__=0
>> >> > +
>> >> > +# Create multiple files with unique contents. Takes the number of
>> >> > +# directories, the number of files in each directory, and the base
>> >> > +# directory.
>> >> > +#
>> >> > +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
>> >> > +#                                each in the specified directory, all
>> >> > +#                                with unique contents.
>> >> > +
>> >> > +test_create_unique_files() {
>> >> > +     test "$#" -ne 3 && BUG "3 param"
>> >> > +
>> >> > +     local dirs=$1
>> >> > +     local files=$2
>> >> > +     local basedir=$3
>> >> > +
>> >> > +     rm -rf $basedir >/dev/null
>> >>
>> >> Why the >/dev/null? It's not a "-rfv", and any errors would go to
>> >> stderr.
>> >
>> > Will fix. Clearly I don't know UNIX very well.
>> >
>> >>
>> >> > +             mkdir -p "$dir" > /dev/null
>> >>
>> >> Ditto.
>> >
>> > Will fix.
>> >
>> >>
>> >> > +             for j in $(test_seq $files)
>> >> > +             do
>> >> > +                     test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
>> >> > +                     echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
>> >>
>> >> Would be much more readable if we these variables were shorter.
>> >>
>> >> But actually, why are we trying to create files as a function of "date
>> >> -u" at all? This is all in the trash directory, which is rm -rf'd beween
>> >> runs, why aren't names created with test_seq or whatever OK? I.e. just
>> >> 1.txt, 2.txt....
>> >>
>> >
>> > The uniqueness is in the contents of the file.  I wanted to make sure that
>> > we are really creating new objects and not reusing old ones.  Is the scope
>> > of the "trash repo" small enough that I can be guaranteed that a new one
>> > is created before my test since the last time I tried adding something to
>> > the ODB?
>> >
>> >> > +test_expect_success 'stash with core.fsyncobjectfiles=batch' "
>> >> > +     test_create_unique_files 2 4 fsync-files &&
>> >> > +     git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
>> >> > +     rm -f fsynced_files &&
>> >> > +
>> >> > +     # The files were untracked, so use the third parent,
>> >> > +     # which contains the untracked files
>> >> > +     git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
>> >> > +     test_line_count = 8 fsynced_files &&
>> >> > +     cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
>> >> > +"
>> >> > +
>> >> > +
>> >> >  test_expect_success 'stash -c stash.useBuiltin=false warning ' '
>> >> >       expected="stash.useBuiltin support has been removed" &&
>> >>
>> >> We really prefer our tests to create the same data each time if
>> >> possible, but as noted with the "date -u" comment above you're
>> >> explicitly bypassing that, but I still can't see why...
>> >
>> > I'm trying to make sure we get new object contents. Is there a better
>> > way to achieve what I want without the risk of finding that the contents
>> > are already in the database from a previous test run?
>>
>> You can just do something like:
>>
>> test_expect_success 'setup data' '
>>         test_commit A &&
>>         test_commit B
>> '
>>
>> Which will create files A.t, B.t etc, or create them via:
>>
>>     obj=$(echo foo | git hash-object -w --stdin)
>>
>> etc.
>>
>> I.e. the uniqueness you're doing here seems to assume that tests are
>> re-using the same object store across runs, but we create a new trash
>> directory for each one, if you run the test with "-d" you can see it
>> being left behind for inspection. This is already ensured for the test.
>>
>> The only potential caveat I can imagine is that some filesystem like say
>> btrfs-like that does some COW or object de-duplication would behave
>> differently, but other than that...
>
> It looks like the same repo is reused for each test_expect_success
> line in the top-level t*.sh script.
> So for test_create_unique_files to be maximally useful, it should have
> some state that is different for
> each invocation.  How about I use the test_tick mechanism to produce
> this uniqueness?  It wouldn't
> be globally unique like the date method, but it should be good enough
> if the repo is recycled every time
> test-lib is reinitialized.
>
> I'm changing lib-unique-files to use test_tick and to be a little more
> readable as you suggested. Please
> let me know if you have any other suggestions.

Ah, sorry, I thought you meant you wanted uniqueness within the test
file, but no, by default we'll create *one* repo for you, and each
test_expect_success reuses that.

Generally tests that want that do one of (in each test_expect_success):

# I'm making my own repo
git init new-repo 1 &&
(
	cd new-repo-1 &&
	[...]
)

# Or, in the first one
<setup the repo data>
# Then, in a second one
git clone . new-repo-1

I.e. just using "git clone" to ferry the data around, or cp -R if you'd
like to retain the exact file layout etc.