[Hotplug_sig] Hotplug for virtualization use case (Draft)

Martine.Silbermann at hp.com (Silbermann, Martine) · Fri Aug 26 14:01:39 2005

This is the first draft of the use case on "Hotplug for Virtualization".
Please review and share your comments.
Next week (when Mary is back) we'll post this use case both off the
Hotplug SIG webpage and the use case page (mail will be sent to let you
know when it's posted).
There's also a new terminology page dedicated to virtualization specific
terms that are referred to in the use case, please review that page as
well http://developer.osdl.org/maryedie/HOTPLUG/VirtTerminology.shtml

Thanks for your input.
Martine

HOTPLUG for VIRTUALIZATION USE CASE

---------------------------------
Use Case:
Hotplug for Virtualization
---------------------------------
Version 0.1 (Draft)
Last Modified Date: 08/19/05

Copyright (c) 2005 by The Open Source Development Lab, Inc. Verbatim 
copying and distribution of this document is permitted in any medium, 
provided this notice is preserved.  Draft copies of this document may 
not be posted publicly without indicating the draft status.

---------------------------------

Table of Contents
Description	
Target Acceptance	
Participants/Roles	
Scenarios	
Dependencies
Implementation Notes	
References	

---------------------------------

---------------------------------
Description
---------------------------------
Independently of the size or number of actual physical systems used, 
(our concept of virtualization includes using multiple separate 
systems as the hardware base) virtualization provides an abstraction 
layer to give the user access to virtual machines that are independent
of each other. This allows for higher quality of service resource 
allocation, increases security through resource isolation, provides 
transparent resource redirection, leads to better hardware 
consolidation and ultimately also lowers the management cost. The 
mapping between physical resource and virtual resource can be achieved
through numerous mechanisms which are not the focus of this use case.
However to keep our focus on the needs of the majority of the open 
source community we will limit the scope of this use case to the 
virtual machine monitors known as Type-I VMM which include Xen, 
VMWare ESX and Virtual Iron (TM) VFe. For further details on the 
various types of virtual machine monitors refer to R.P. Goldberg's 
article (see reference [Goldberg74]). Within the scope of the Type-I 
VMM the virtual layer can be seen as a combination of the virtual 
machine monitor and some "special" virtual machines. We will refer 
to the kernel associated with the VMM as the "VMM kernel".
The kernel and OS associated with the Virtual Machine will be called 
the "guest kernel" and "guest OS" respectively. 

Our goal here is 
1) to determine the role of hotplug in virtualization and 

2) to identify the requirements to support the various operations both
at the physical and virtualization layer that make use of hotplug.

In this document we will use commonly known hotplug specific 
terminology defined at 
developer.osdl.org/maryedie/HOTPLUG/Terminology.shtml as well as 
commonly known virtualization specific terminology defined at 
developer.osdl.org/maryedie/HOTPLUG/VirtTerminology.shtml 

---------------------------------
Target Acceptance
---------------------------------
When an application is running on a virtual machine it should be able 
to expect  maximum reliability of the system, have access to any 
resources it needs at any given time without disruption to its 
execution and be unaffected by any configuration changes happening at 
the hardware level. 

Replacement of hardware components and addition and/or removal of 
physical or virtual components should be totally transparent to the 
application. In this context components refer to system hardware 
resources like processors, memory, I/O devices, and nodes which is
constituted of any combination of processors, memory and I/O combined 
as a hotplugable unit. The requirements for hotplug support of 
virtualization should be independent of the choice of virtualization 
approach and should be integrated into the mainline kernel. 
Virtualization will provide a perfect test environment for different 
hotplug features. This use case will help the Hotplug SIG define the 
appropriate test scenarios to support virtualization. We can also use 
this description to do a gap analysis for the hotplug code that is 
either already in the kernel or is currently being developed.

---------------------------------
Participants/Roles
---------------------------------
Systems Administrator
---
Special class of user that has special privileges on a given
system. This is a role held by an individual that acts as
the administrator for a system.
---
* Application Administrator
---
Special class of user that has special privileges for a given 
application. This is a role held by an individual that acts as 
the administrator for all aspects of an application.
---
* User
---
Any user on the system. This is a role that is held by all individuals
using a system.  The user can interact with the system through the 
processes associated with applications they are using. Root Users 
are users who have root privileges. They are typically the system 
administrator.  Privileged users have some of the root user 
privileges, but not all.  They are typically an operations staff 
member.

---------------------------------
Scenarios
---------------------------------

There are 4 considered scenarios:

-------------------------
1.Serviceability (hotplug at physical layer)
-------------------------
In this sub-case the System Administrator needs the ability to 
remove/replace failing components. 
Unfortunately, CPU failures  tend to be fatal and usually don't give 
any warning. Fortunately, they're also very infrequent. Because they
are usually fatal, it's likely that you won't be looking at a 
hot-remove scenario.(Though, if you have a processor failure and 
remove it while the system is down, the System Administrator needs 
the option to reboot immediately and hot-add the replacement). In 
contrast, memory and I/O often give adequate warning, via single-bit 
or parity errors, that they're failing; thus providing an opportunity 
to have them replaced before they cause a system failure.

-------------------------------------
2.Capacity management (hotplug at physical layer)
-------------------------------------
System Administrators need the option to add more physical resources 
to the virtualization layer to create larger virtual machines, or to
relocate physical resources when balancing hardware resources
across multiple virtualization layers to support specific workloads.

------------------------------------------------
3.Migration of the virtualization layer (hotplug at physical layer)
------------------------------------------------

In this sub-case the System Administrator adds some resources 
and removes others as a way to migrate the virtualization  
layer onto different hardware. "Different hardware" may mean
completely different hardware, or it could simply mean upgrades 
of existing hardware.  It could also be accomplished on 
a system that provides hard partitioning.

------------------------------------------------
4.Virtual resources management (hotplug at the virtualization layer)
------------------------------------------------

This sub-case deals with hot-plug of virtual resources to/from the 
OS instances in the virtual machines.   The reason to do this is for 
capacity management: giving each virtual machine exactly the resources 
that it requires to support application workload requirements, and
making those resources apparent to the guest OS,while leaving the 
remainder available for other virtual machines. The need 
for hot-plug of resources to/from the guest OS(es) depends 
on how the virtualization layer and the OS(es) interact. However the 
hotplug features required to support either case should be mainstream.

------------------------------------------------
Workflow for Scenario (1) on Serviceability
------------------------------------------------
This scenario covers the expected succession of events when a 
component shows signs of failure such as multiple parity errors for 
memory. We assume that some sort of event log analyzer will detect 
that a component is displaying signs of possible failure and will 
either post a message at a predetermined location or send a message to
a dedicated thread to take action. The choice of posting a message or 
automatically taking action to isolate the faulty hardware component 
from the rest of the system should be done at boot time most likely 
through a configuration option. In the case where a message is posted 
it is up to the system administrator to take the initiative of 
requesting that the component be isolated and eventually replaced 
using hotplug functionality. A hotplug event with remove action needs 
to be generated to inform the host kernel that the component needs to 
be hot-removed. 

The hotplug event handler also has to notify the virtualization layer 
that a hardware resource will be eliminated so that the virtual 
machine(s) that was (were) currently using it get it taken away from 
their resources or have it substituted by another available equivalent
physical resource.

------------------------------------------------
Workflow for Scenario (2) on Capacity management 
------------------------------------------------

This scenario covers the expected succession of events when a system
administrator decides that the existing physical resources are no 
longer sufficient to either cover the needs of the current virtual 
machines or to allow for creation of new virtual machines needed to 
complete the project. The virtualization layer may or may not provide 
a management tool as an aid to the system administrator to highlight 
such needs . Such management tool could also provide a means to 
communicate with the virtualization layer that a new component needs 
to be added and that a hotplug event to add that component needs to 
be generated. 

------------------------------------------------
Workflow for Scenario (3) on Migration of the virtualization layer 
------------------------------------------------
This scenario covers the expected succession of events when a system
administrator decides that a set of existing physical resources must 
be replaced either in the context of an upgrade or for full 
replacement of one of the platforms that contributes to the hardware 
resources. The latter case can of course only occur in a configuration
in which the hardware resources are constituted by multiple separate 
platforms. It is the responsibility of the system administrator to 
either directly inform the VMM of the request for hot remove of those
components or to provide the management tool with the information 
required to handle the hotplug event.
The challenge in this specific scenario is that several components 
will be removed at the same time. So if the mechanism used to transfer
the VMM to other resources can be made aware of the fact that multiple
components are being hot-removed the operation may be more efficient 
and coherent than if each component is removed individually. 
After the hot-remove of all components was successfully accomplished 
the physical components are replaced,  and the proper actions are
taken to initiate to trigger hot-add of those new components. The 
specifics of how the hot-add is triggered is very dependent of the 
VMM itself (depends on things such as what the VMM kernel is or if
the VMM has its own embedded management tool).

------------------------------------------------
Workflow for Scenario (4) on Virtual resources management
------------------------------------------------
This scenario covers the expected succession of events when 
redistribution of resources is required to address the current needs 
of the guest OSs as their workloads vary. An application manager or 
a priviledged user knowing their application's workload requirements
may request to the VMM that specific type/amount of resources be 
allocated to a given virtual machine. We assume that the guest OS 
running on the virtual machine can support all types of hotplug 
events. Also for this use case we will assume that any given CPU
that has been hot-added to a virtual machine is fully dedicated to 
that VM. From the OS's point-of-view the hotplug event will be handled
identically as when the OS is running directly on the hardware except 
that when a resource such as a CPU is removed instead of passing it 
down to the PAL it will be passed to the VMM. While each specific
implementation of virtualization may lead to a different interaction
process between the VMM and the OS running on each VM, the mechanism 
of redistributing the virtual resources from a hotplug point-of-view
is the same.

---------------------------------
Dependencies
---------------------------------
** An event log analyzer needs to be present in the system to detect 
when a component shows signs of failure. 
** The guest OS support logical hotplugging of all possible components
** Possibly changes will be needed in the system firmware to support 
the various hotplug operations.

---------------------------------
Implementation Notes
---------------------------------
In the previous 3 scenarios one has to take into account the 
possibility that the replacement hardware is of a different type than 
the original, for example a more recent version of a device or a 
different speed CPU. The hot-add operations for each component is 
responsible for handling such event.

---------------------------------
References
---------------------------------
[Goldberg74]  "Survey of Virtual Machines Research", Robert P. 
Goldberg,  IEEE Computer, pp. 34-45, June 1974