On Mon, Jul 9, 2018 at 8:10 PM, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
We discussed reducing the number of volumes in the maintainers' meeting.Should we still go ahead and do that?
I m not sure about exactly what was discussed. But reducing the number of volumes may defeat the purpose of the test, as the bug it is fixing is reproducible only with more number of volumes. I think Jeff will be able to tell how much is more. I think we can move this to centos CI brick mux regression job, if it runs on machines with higher RAM?
Regards,
Poornima
On 9 July 2018 at 15:45, Xavi Hernandez <jahernan@xxxxxxxxxx> wrote:On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> wrote:Hi Deepshikha,Are you looking into this failure? I can still see this happening for all the regression runs.I've executed the failing script on my laptop and all tests finish relatively fast. What seems to take time is the final cleanup. I can see 'semanage' taking some CPU during destruction of volumes. The test required 350 seconds to finish successfully.Not sure what caused the cleanup time to increase, but I've created a bug [1] to track this and a patch [2] to give more time to this test. This should allow all blocked regressions to complete successfully.Xavi[2] https://review.gluster.org/20482 Thanks & Regards,Karthik______________________________On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:https://build.gluster.org/job/regression-test-with-multiplex has the same test failing. Is the reason of the failure different given this is on jenkins?/794/display/redirect --On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal <dkhandel@xxxxxxxxxx> wrote:Hi folks,
The issue[1] has been resolved. Now the softserve instance will be
having 2GB RAM i.e. same as that of the Jenkins builder's sizing
configurations.
[1] https://github.com/gluster/softserve/issues/40
Thanks,
Deepshikha Khandelwal
On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> wrote:
>
>
> On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, <dkhandel@xxxxxxxxxx>
> wrote:
>>
>> Hi Poornima/Karthik,
>>
>> We've looked into the memory error that this softserve instance have
>> showed up. These machine instances have 1GB RAM which is not in the
>> case with the Jenkins builder. It's 2GB RAM there.
>>
>> We've created the issue [1] and will solve it sooner.
>
> Great. Thanks for the update.
>>
>>
>> Sorry for the inconvenience.
>>
>> [1] https://github.com/gluster/softserve/issues/40
>>
>> Thanks,
>> Deepshikha Khandelwal
>>
>> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <ksubrahm@xxxxxxxxxx>
>> wrote:
>> > Thanks Poornima for the analysis.
>> > Can someone work on fixing this please?
>> >
>> > ~Karthik
>> >
>> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>> > <pgurusid@xxxxxxxxxx>
>> > wrote:
>> >>
>> >> The same test case is failing for my patch as well [1]. I requested for
>> >> a
>> >> regression system and tried to reproduce it.
>> >> From my analysis, the brick process (mutiplexed) is consuming a lot of
>> >> memory, and is being OOM killed. The regression has 1GB ram and the
>> >> process
>> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>> >> considering
>> >> there is 1000 threads in that brick process.
>> >> Ways to fix:
>> >> - Increase the regression system RAM size OR
>> >> - Decrease the number of volumes in the test case.
>> >>
>> >> But what is strange is why the test passes sometimes for some patches.
>> >> There could be some bug/? in memory consumption.
>> >>
>> >> Regards,
>> >> Poornima
>> >>
>> >>
>> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya
>> >> <ksubrahm@xxxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> $subject is failing on centos regression for most of the patches with
>> >>> timeout error.
>> >>>
>> >>> 07:32:34
>> >>>
>> >>> ============================================================ ====================
>> >>> 07:32:34 [07:33:05] Running tests in file
>> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> >>> 07:32:34 Timeout set is 300, default 200
>> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out
>> >>> after 300 seconds
>> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: bad status
>> >>> 124
>> >>> 07:37:34
>> >>> 07:37:34 *********************************
>> >>> 07:37:34 * REGRESSION FAILED *
>> >>> 07:37:34 * Retrying failed tests in case *
>> >>> 07:37:34 * we got some spurious failures *
>> >>> 07:37:34 *********************************
>> >>> 07:37:34
>> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out
>> >>> after 300 seconds
>> >>> 07:42:34 End of test ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> >>> 07:42:34
>> >>>
>> >>> ============================================================ ====================
>> >>>
>> >>> Can anyone take a look?
>> >>>
>> >>> Thanks,
>> >>> Karthik
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Gluster-devel mailing list
>> >>> Gluster-devel@xxxxxxxxxxx
>> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>> >>
>> >>
>> >
>> > _______________________________________________
>> > Gluster-infra mailing list
>> > Gluster-infra@xxxxxxxxxxx
>> > https://lists.gluster.org/mailman/listinfo/gluster-infra
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
- Atin (atinm)_________________
Gluster-infra mailing list
Gluster-infra@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-infra
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-devel