Re: [Gluster-infra] Reboot for Meltdown and stuff

Amye Scavarda <amye@xxxxxxxxxx> · Sat, 6 Jan 2018 14:50:13 -0800

Thanks for your quick work on this, get some rest! We can look at supercolony next week when you're back in action. 
- amye 

On Sat, Jan 6, 2018 at 11:48 AM, Michael Scherer <mscherer@xxxxxxxxxx> wrote:
Le samedi 06 janvier 2018 à 11:44 +0100, Michael Scherer a écrit :

> Le vendredi 05 janvier 2018 à 14:24 +0100, Michael Scherer a écrit :

> > Hi,

> >

> > unless you are living in a place without any internet (a igloo in

> > Antartica, the middle of the Gobi desert, a bunker in Switzerland

> > or

> > simply the Paris underground train), you may have seen the news

> > that

> > this week is again a security nightmare (also called "just a normal

> > Wednesday" among practitioners ), and that we have important kernel

> > patch to push, that do requiers a reboot. 

> >

> > See https://spectreattack.com/ 

> >

> > While I suspect our infra will not be targeted and there is more

> > venue

> > to attack on local computers and browsers who are the one running

> > proprietary random code in form of JS on a regular basis, we still

> > have

> > to upgrade everything to be sure.

> >

> > Therefor, I am gonna have to reboot all the infra (yes, the 83

> > servers), minus the few servers I already did reboot (because in

> > HA,

> > or

> > not customer facing) tomorrow.

> >

> > I will block jenkins, and wait for the jobs to be finished before

> > rebooting the various servers. I will send a email tomorrow once

> > the

> > reboot start (e.g., when/if I wake up), and another one things are

> > good

> > (or if stuff broke in a horrible fashion too, as it happened

> > today).

> >

> > If there is some precaution or anything to take, people have around

> > 24h

> > to voice their concerns. 

>

> Reboot is starting. I already did various backend servers, the

> document

> I used for tracking the work is on 

> https://bimestriel.framapad.org/p/gluster_infra_reboot

So almost all Linux servers got rebooted, most without issues, but

during the day, I started to have the first symptom of a cold

(headaches, shivering, etc), so I had to ping Nigel to finish the last

server (who wasn't without issue)

For people who do not want gruesome details on the reboots, you can

stop here.

We did got some trouble with:

- a few servers on Rackspace (mostly infra) with cloud-init reseting

the configuration to dhcp, and the dhcp not working. I am finally

changing that and was in the course of fixing it for good before going

back to bed.

- gerrit didn't start automatically at boot. I know we had a fix for

that, but not sure on why it didn't work, or if we didn't deployed yet.

- supercolony seems to be unable to boot the latest kernel. It went so

bad that the emergency console wasn't working. A erroneous message said

"disabled for your account", so I did open a rackspace ticket and

waited. This occurred as I started to not feel well, so I didn't really

searched more, or I would have:

   - seen that the console was working for others servers (thus

erroneous messages)

   - would have tried harder to boot another kernel 

   - search a bit more on internal list that said "there is some issue

somewhere around RHEL 6". Didn't investigate more, but that's also what

happened.

In the end, Nigel took over the problem solving and pinged harder

Rackspace, whose support suggested to boot another kernel, which he did

(but better than I did).

And thus supercolony is back, but not upgraded.

The last one still puzzle me, because the current configuration is:

"default=2", so that should start the 3rd kernel in the list.

Grub doc say "The first entry (here, counting starts with number zero,

not one!) will be the default choice", it was "0" when i first tried to

boot another kernel (switched to 1).

So since we have:

[root@supercolony ~]# grep title /boot/grub/menu.lst 

title Red Hat Enterprise Linux Server (2.6.32-696.18.7.el6.x86_64)

title Red Hat Enterprise Linux Server (2.6.32-696.16.1.el6.x86_64)

title Red Hat Enterprise Linux Server (2.6.32-642.15.1.el6.x86_64)

default=1 should have used 2.6.32-696.16.1, but it didn't boot.

Nigel changed it for "default=2", so that should have used 2.6.32-

642.15.1, but plot twist...

# uname -a

Linux supercolony.gluster.org 2.6.32-696.16.1.el6.x86_64 #1 SMP Sun Oct

8 09:45:56 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

So there is something fishy for grub, but as I redact that from my bed,

maybe the problem is on my side. I am sure it will be clearer once I

hit "send".

So, to recap, we have one or two servers to upgrade (cf the pad), the

*bsd are not patched yet (I quickly checked their lists, but I do not

expect it soon), but since the more urgent issues were on the

hypervisor side, we are ok for that.

The grub on supercolony need to be investigated, and supercolony should

be upgraded as well.

I also need to take some rest.

Many thanks for Nigel for taking over when my body failed me.

--

Michael Scherer

Sysadmin, Community Infrastructure and Platform, OSAS

_______________________________________________

Gluster-infra mailing list

Gluster-infra@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-infra

-- 
Amye Scavarda | amye@xxxxxxxxxx | Gluster Community Lead

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel