Re: Release 6.1: Expected tagging on April 10th

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, Apr 17, 2019 at 8:53 AM Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx> wrote:
My take is, lets disable sdfs for 6.1 (we also have issues with its performance anyways). We will fix it properly by 6.2 or 7.0. Continue with marking sdfs-sanity.t tests as bad in that case.

It is better to revert the patch like Atin mentioned. The patch that was merged was intended to find the existing bugs with lk-owner before getting merged. We thought there were no bugs when regression passed. But that is not the case. So better to revert, fix other bugs found by this patch and then get it merged again.
 

-Amar

On Wed, Apr 17, 2019 at 8:04 AM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:


On Wed, Apr 17, 2019 at 12:33 AM Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:


On Tue, Apr 16, 2019 at 10:27 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:


On Tue, Apr 16, 2019 at 9:19 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:


On Tue, Apr 16, 2019 at 7:24 PM Shyam Ranganathan <srangana@xxxxxxxxxx> wrote:
Status: Tagging pending

Waiting on patches:
(Kotresh/Atin) - glusterd: fix loading ctime in client graph logic
  https://review.gluster.org/c/glusterfs/+/22579

The regression doesn't pass for the mainline patch. I believe master is broken now. With latest master sdfs-sanity.t always fail. We either need to fix it or mark it as bad test.

commit 3883887427a7f2dc458a9773e05f7c8ce8e62301 (HEAD)
Author: Pranith Kumar K <pkarampu@xxxxxxxxxx>
Date:   Mon Apr 1 11:14:56 2019 +0530

   features/locks: error-out {inode,entry}lk fops with all-zero lk-owner

   Problem:
   Sometimes we find that developers forget to assign lk-owner for an
   inodelk/entrylk/lk before writing code to wind these fops. locks
   xlator at the moment allows this operation. This leads to multiple
   threads in the same client being able to get locks on the inode
   because lk-owner is same and transport is same. So isolation
   with locks can't be achieved.

   Fix:
   Disallow locks with lk-owner zero.

   fixes bz#1624701
   Change-Id: I1c816280cffd150ebb392e3dcd4d21007cdd767f
   Signed-off-by: Pranith Kumar K <pkarampu@xxxxxxxxxx>

With the above commit sdfs-sanity.t started failing. But when I looked at the last regression vote at https://build.gluster.org/job/centos7-regression/5568/consoleFull I saw it voted back positive but the bell rang when I saw the overall regression took less than 2 hours and when I opened the regression link I saw the test actually failed but still this job voted back +1 at gerrit. 

Deepshika - This is a bad CI bug we have now and have to be addressed at earliest. Please take a look at https://build.gluster.org/job/centos7-regression/5568/consoleFull and investigate why the regression vote wasn't negative.

Pranith - I request you to investigate on the sdfs-sanity.t failure because of this patch.

sdfs is supposed to serialize entry fops by doing entrylk, but all the locks are being done with all-zero lk-owner. In essence sdfs doesn't achieve its goal of mutual exclusion when conflicting operations are executed by same client because two locks on same entry with same all-zero-owner will get locks. The patch which lead to sdfs-sanity.t failure treats inodelk/entrylk/lk fops with all-zero lk-owner as Invalid request to prevent these kinds of bugs. So it exposed the bug in sdfs. I sent a fix for sdfs @ https://review.gluster.org/#/c/glusterfs/+/22582

Since this patch hasn't passed the regression and now that I see tests/bugs/replicate/bug-1386188-sbrain-fav-child.t hanging and timing out in the latest nightly regression runs because of the above commit (tested locally and confirm) I still request that we first revert this commit, get master back to stable and then put back the required fixes.



@Maintainers - Please open up every regression link to see the actual status of the job and don't blindly trust on the +1 vote back at gerrit till this is addressed.

As per the policy, I'm going to revert this commit, watch out for the patch. I request this to be directly pushed with out waiting for the regression vote as we had done before in such breakage. Amar/Shyam - I believe you have this permission?


root@a5f81bd447c2:/home/glusterfs# prove -vf tests/basic/sdfs-sanity.t
tests/basic/sdfs-sanity.t ..
1..7
ok 1, LINENUM:8
ok 2, LINENUM:9
ok 3, LINENUM:11
ok 4, LINENUM:12
ok 5, LINENUM:13
ok 6, LINENUM:16
mkdir: cannot create directory ‘/mnt/glusterfs/1/coverage’: Invalid argument
stat: cannot stat '/mnt/glusterfs/1/coverage/dir': Invalid argument
tests/basic/rpc-coverage.sh: line 61: test: ==: unary operator expected
not ok 7 , LINENUM:20
FAILED COMMAND: tests/basic/rpc-coverage.sh /mnt/glusterfs/1
Failed 1/7 subtests

Test Summary Report
-------------------
tests/basic/sdfs-sanity.t (Wstat: 0 Tests: 7 Failed: 1)
  Failed test:  7
Files=1, Tests=7, 14 wallclock secs ( 0.02 usr  0.00 sys +  0.58 cusr  0.67 csys =  1.27 CPU)
Result: FAIL



Following patches will not be taken in if CentOS regression does not
pass by tomorrow morning Eastern TZ,
(Pranith/KingLongMee) - cluster-syncop: avoid duplicate unlock of
inodelk/entrylk
  https://review.gluster.org/c/glusterfs/+/22385
(Aravinda) - geo-rep: IPv6 support
  https://review.gluster.org/c/glusterfs/+/22488
(Aravinda) - geo-rep: fix integer config validation
  https://review.gluster.org/c/glusterfs/+/22489

Tracker bug status:
(Ravi) - Bug 1693155 - Excessive AFR messages from gluster showing in
RHGSWA.
  All patches are merged, but none of the patches adds the "Fixes"
keyword, assume this is an oversight and that the bug is fixed in this
release.

(Atin) - Bug 1698131 - multiple glusterfsd processes being launched for
the same brick, causing transport endpoint not connected
  No work has occurred post logs upload to bug, restart of bircks and
possibly glusterd is the existing workaround when the bug is hit. Moving
this out of the tracker for 6.1.

(Xavi) - Bug 1699917 - I/O error on writes to a disperse volume when
replace-brick is executed
  Very recent bug (15th April), does not seem to have any critical data
corruption or service availability issues, planning on not waiting for
the fix in 6.1

- Shyam
On 4/6/19 4:38 AM, Atin Mukherjee wrote:
> Hi Mohit,
>
> https://review.gluster.org/22495 should get into 6.1 as it’s a
> regression. Can you please attach the respective bug to the tracker Ravi
> pointed out?
>
>
> On Sat, 6 Apr 2019 at 12:00, Ravishankar N <ravishankar@xxxxxxxxxx
> <mailto:ravishankar@xxxxxxxxxx>> wrote:
>
>     Tracker bug is https://bugzilla.redhat.com/show_bug.cgi?id=1692394, in
>     case anyone wants to add blocker bugs.
>
>
>     On 05/04/19 8:03 PM, Shyam Ranganathan wrote:
>     > Hi,
>     >
>     > Expected tagging date for release-6.1 is on April, 10th, 2019.
>     >
>     > Please ensure required patches are backported and also are passing
>     > regressions and are appropriately reviewed for easy merging and
>     tagging
>     > on the date.
>     >
>     > Thanks,
>     > Shyam
>     > _______________________________________________
>     > Gluster-devel mailing list
>     > Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
>     > https://lists.gluster.org/mailman/listinfo/gluster-devel
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
>     https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
> --
> - Atin (atinm)
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel


--
Pranith


--
Amar Tumballi (amarts)


--
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux