Re: [Gluster-infra] regression machines reporting slowly ? here is the reason ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
> > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever <pkalever@xxxxxxxxxx> wrote:
> > >> Hi all,
> > >>
> > >> Noticed our regression machines are reporting back really slow,
> > >> especially CentOs and Smoke
> > >>
> > >> I found that most of the slaves are marked offline, this could be the
> > >> biggest reasons ?
> > >>
> > >>
> > >
> > > Regression machines are scheduled to be offline if there are no active
> > > jobs. I wonder if the slowness is related to LVM or related factors as
> > > detailed in a recent thread?
> > >
> > 
> > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut)
> > 
> > Hi Vijay,
> > 
> > Honestly I was not aware of this case where the machines move to
> > offline state by them self, I was only aware that they just go to idle
> > state,
> > Thanks for sharing that information. But we still need to reclaim most
> > of machines, Here are the reasons why each of them are offline.
> 
> Well, slaves go into offline, and should be woken up when needed.
> However it seems that Jenkins fails to connect to many slaves :-/
> 
> I've rebooted:
> 
>  - slave46
>  - slave28
>  - slave26
>  - slave25
>  - slave24
>  - slave23
>  - slave21
> 
> These all seem to have come up correctly after clicking the 'Lauch slave
> agent' button on the slave's status page.
> 
> Remember that anyone with a Jankins account can reboot VMs. This most
> often is sufficient to get them working again. Just go to
> https://build.gluster.org/job/reboot-vm/ , login and press some buttons.
> 
> One slave is in a weird status, maybe one of the tests overwrote the ssh
> key?
> 
>     [04/24/16 06:48:02] [SSH] Opening SSH connection to slave29.cloud.gluster.org:22.
>     ERROR: Failed to authenticate as jenkins. Wrong password. (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
>     [04/24/16 06:48:04] [SSH] Authentication failed.
>     hudson.AbortException: Authentication failed.
>     	at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
>     	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
>     	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
>     	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     	at java.lang.Thread.run(Thread.java:745)
>     [04/24/16 06:48:04] Launch failed - cleaning up connection
>     [04/24/16 06:48:05] [SSH] Connection closed.
> 
> Leaving slave29 as is, maybe one of our admins can have a look and see
> if it needs reprovisioning.

Seems slave29 was reinstalled and/or slightly damaged, it was no longer
in salt configuration, but I could connect as root. 

It should work better now, but please tell me if anything is incorrect
with it.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS


Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux