Les Mikesell wrote:
Roger Heflin wrote:
Yes, and typically to support anything recent you have too many
add-ons on the enterprise OSes, if you are in a fast moving enterprise
environment RHEL won't work.
Fast moving and enterprise are words you don't usually see together.
Don't you have to keep decades-old processes running?
RHEL is probably quite good for any of the nice simple static
enterprise environments, but most would argue there you should
probably lock everything down so tight that few kernel
updates/userspace are even required for anything, the problem is in an
environment were you are constantly having to bring in new hardware
that does not work on the older release, where you cannot wait 6
months for RHEL to catch up.
I can't recall ever being in a position of "having to bring in new
hardware". What scenario forces this issue on you? I haven't noticed a
shortage of vendors who will sell RHEL supported boxes. But it sounds
like you have an interesting job...
More cpu power needed to do the job. And the new boxes aren't officially RHEL
supported (and sometimes won't even boot with the latest update-but will work
with the latest fedora/kernel.org). You typically bring in around a large
enough set of new machines at a time (usually this was 100-200 machines) any
only update the pieces required to support that new machine, and then you run
some test to validate that it gives the correct answers for various jobs. It
is really a money issue, the new machine is 2x the speed but not yet supported
by RHEL, so you would need 2x the number of old supported machines for 2x the
cost or more. Reliability was required (50 or so disk servers any one of which
would cause at least partial loss of access to data).
The problem is validating that the new hardware/OS (or old hardware/just an huge
update) is the same process, so you change only what you need to, and you are
better off starting with as new as possible and going from there. We were
typically only updating the kernel, change userspace was even more dangerous as
a bad library update could change answers, so unless we found a library bug that
could not be easily worked around, we did not update it.
Some of the compute customers don't apply updates, it is too risky to cause
downtime/wrong answers, they fix the issues that they find, and then every 1-2
years they update the older stuff to what is currently being used/proven on the
newest machines, the situation you have is you have sets of machines with
slightly different OS loads (which is kind of nasty), but the other choice is
that you update everything all of the time (and that is even worse-as too many
different types of HW must be revalidated to still give the correct answer).
You end up with old servers that may be running what was originally determined
to be stable by testing and they aren't touched. In this environment the
testing is just as bad with any OS as any other, you start with something like
F8 and then update/downgrade any parts that fail you to something that works,
and they you don't touch it for several years. I had a subset of machines
(about 250 machines) all of which had reached about 500+ days of uptime (the
uptime counter rolled over), the 20 or so machines out of that set that failed
to reach that uptime were all HW issues (usually disk failures), but some more
lethal failures that required the retiring of the specific machine, it was not
typically worth paying for fixing the nastier issues (usually MB failures) so
they were retired, it was often cheap to just buy a something new that was
validated and much faster. The issue with all OSes is that no one tests
enough to catch these high MTBF issues, and in a big environment a machine
crashing 1x per every 1000 days of uptime, comes to 1 machine a day crashing
because of software, and typically the enterprise OSes aren't even close to that
level, and while fedora is worse, it is just not that much worse.
Roger
--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list