Re: [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Shyam/Atin,

I have posted the patch[1] for geo-rep test cases failure:
    tests/00-geo-rep/georep-basic-dr-rsync.t
    tests/00-geo-rep/georep-basic-dr-tarssh.t
    tests/00-geo-rep/00-georep-verify-setup.t

Please include patch [1] while triggering tests.
The instrumentation patch [2] which was included can be removed.

[1]  https://review.gluster.org/#/c/glusterfs/+/20704/
[2]  https://review.gluster.org/#/c/glusterfs/+/20477/

Thanks,
Kotresh HR




On Fri, Aug 10, 2018 at 3:21 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:


On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:


On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan <srangana@xxxxxxxxxx> wrote:
Today's patch set 7 [1], included fixes provided till last evening IST,
and its runs can be seen here [2] (yay! we can link to comments in
gerrit now).

New failures: (added to the spreadsheet)
./tests/bugs/protocol/bug-808400-repl.t (core dumped)
./tests/bugs/quick-read/bug-846240.t

Older tests that had not recurred, but failed today: (moved up in the
spreadsheet)
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t

The above test is timing out. I had to increase the timeout while adding the .t so that creation of maximum number of links that will max-out in ext4. Will re-check if it is the same issue and get back.

This test is timing out with lcov. I bumped up timeout to 30 minutes @ https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that this test takes so long, but without this it is difficult to find regression on ext4 which has limits on number of hardlinks in a directory(It took us almost one year after we introduced regression to find this problem when we did introduce regression last time). If there is a way of running this .t once per day and before each release. I will be happy to make it part of that. Let me know.
 
 

Other issues;
Test ./tests/basic/ec/ec-5-2.t core dumped again
Few geo-rep failures, Kotresh should have more logs to look at with
these runs
Test ./tests/bugs/glusterd/quorum-validation.t dumped core again

Atin/Amar, we may need to merge some of the patches that have proven to
be holding up and fixing issues today, so that we do not leave
everything to the last. Check and move them along or lmk.

Shyam

[1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
[2] Runs against patch set 7 and its status (incomplete as some runs
have not completed):
https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
(also updated in the spreadsheet)

On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
>
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
>
> 1) We are running the tests using the patch [2].
>
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
>
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
>
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
>
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
>
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
>
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
>
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
>
> 6) Tests that are addressed or are not occurring anymore are,
>
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>
> Shyam (and Atin)
>
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched up patch against which regressions were run:
>> https://review.gluster.org/#/c/20637
>>
>> [3] Failing tests list:
>> https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing
>>
>> [4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________
maintainers mailing list
maintainers@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/maintainers


--
Pranith


--
Pranith

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel



--
Thanks and Regards,
Kotresh H R
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux