Re: [Gluster-Maintainers] Master branch lock down: RCA for tests (bug-1368312.t)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Failure of this test is tracked by bz https://bugzilla.redhat.com/show_bug.cgi?id=1608158.

<description>
I was trying to debug regression failures on [1] and observed that split-brain-resolution.t was failing consistently.

=========================
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
-------------------
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures stat was not served from md-cache, but instead was wound down to afr which failed stat with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests being absorbed either by kernel attribute cache or md-cache. When its not happening stats are reaching afr and resulting in failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t: 
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj: regression failures on afr/split-brain-resolution
</description>

regards,
Raghavendra



On Mon, Aug 13, 2018 at 6:12 AM, Shyam Ranganathan <srangana@xxxxxxxxxx> wrote:
As a means of keeping the focus going and squashing the remaining tests
that were failing sporadically, request each test/component owner to,

- respond to this mail changing the subject (testname.t) to the test
name that they are responding to (adding more than one in case they have
the same RCA)
- with the current RCA and status of the same

List of tests and current owners as per the spreadsheet that we were
tracking are:

./tests/basic/distribute/rebal-all-nodes-migrate.t              TBD
./tests/basic/tier/tier-heald.t         TBD
./tests/basic/afr/sparse-file-self-heal.t               TBD
./tests/bugs/shard/bug-1251824.t                TBD
./tests/bugs/shard/configure-lru-limit.t                TBD
./tests/bugs/replicate/bug-1408712.t    Ravi
./tests/basic/afr/replace-brick-self-heal.t             TBD
./tests/00-geo-rep/00-georep-verify-setup.t     Kotresh
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
./tests/basic/stats-dump.t              TBD
./tests/bugs/bug-1110262.t              TBD
./tests/basic/ec/ec-data-heal.t         Mohit
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t           Pranith
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
TBD
./tests/basic/ec/ec-5-2.t               Sunil
./tests/bugs/shard/bug-shard-discard.t          TBD
./tests/bugs/glusterd/remove-brick-testcases.t          TBD
./tests/bugs/protocol/bug-808400-repl.t         TBD
./tests/bugs/quick-read/bug-846240.t            Du
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t           Mohit
./tests/00-geo-rep/georep-basic-dr-tarssh.t     Kotresh
./tests/bugs/ec/bug-1236065.t           Pranith
./tests/00-geo-rep/georep-basic-dr-rsync.t      Kotresh
./tests/basic/ec/ec-1468261.t           Ashish
./tests/basic/afr/add-brick-self-heal.t         Ravi
./tests/basic/afr/granular-esh/replace-brick.t          Pranith
./tests/bugs/core/multiplex-limit-issue-151.t           Sanju
./tests/bugs/glusterd/validating-server-quorum.t                Atin
./tests/bugs/replicate/bug-1363721.t            Ravi
./tests/bugs/index/bug-1559004-EMLINK-handling.t                Pranith
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t             Karthik
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
        Atin
./tests/bugs/glusterd/rebalance-operations-in-single-node.t             TBD
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t           TBD
./tests/bitrot/bug-1373520.t    Kotresh
./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
./tests/bugs/glusterd/quorum-validation.t       Atin
./tests/bugs/distribute/bug-1042725.t           Shyam
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
        Karthik
./tests/bugs/quota/bug-1293601.t                TBD
./tests/bugs/bug-1368312.t      Du
./tests/bugs/distribute/bug-1122443.t           Du
./tests/bugs/core/bug-1432542-mpx-restart-crash.t       1608568 Nithya/Shyam

Thanks,
Shyam
_______________________________________________
maintainers mailing list
maintainers@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/maintainers

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux