[master] Document iscsi and multipath implementations.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



---
 docs/iscsi.txt     |  169 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 docs/multipath.txt |  143 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 312 insertions(+), 0 deletions(-)
 create mode 100644 docs/iscsi.txt
 create mode 100644 docs/multipath.txt

diff --git a/docs/iscsi.txt b/docs/iscsi.txt
new file mode 100644
index 0000000..ee41582
--- /dev/null
+++ b/docs/iscsi.txt
@@ -0,0 +1,169 @@
+==================
+iSCSI and Anaconda
+==================
+
+
+Introduction
+------------
+
+iSCSI device is a SCSI device connected to your computer via a TCP/IP
+network. The communication can be handled either in hardware or in software, or
+as a hybrid --- part software, part hardware.
+
+The terminology:
+
+- 'initiator', the client in the iscsi connection. The computer we are running
+  Anaconda on is typically an initiator.
+- 'target', the storage device behind the Network. This is where the data is
+  physically stored and read from. You can turn any Fedora/RHEL machine to a
+  target (or several) via scsi-target-utils.
+- 'HBA' or Host Bus Adapter. A device (PCI card typically) you connect to a
+  computer. It acts as a NIC and if you configure it properly it transparently
+  connects to the target when started and all you can see is a block device on
+  your system.
+- 'software initiator' is what you end up with if you emulate most of what HBA is
+  doing and just use a regular NIC for the iscsi communication. The modern Linux
+  kernel has a software initiator. To use it, you need the Open-ISCSI software
+  stack [1, 2] installed. It is known as iscsi-initiator-utils in Fedora/RHEL.
+- 'partial offload card'. Similar to HBA but needs some support from kernel and
+  iscsi-initiator-utils. The least pleasant to work with, particularly because
+  there is no standardized amount of the manual setting that needs to be done
+  (some connect to the target just like HBAs, some need you to bring their NIC
+  part up manually etc.). Partial offload cards exist to get better performing
+  I/O with less processor load than with software initiator.
+- 'iBFT' as in 'Iscsi Boot Firmware Table'. A table in the card's bios that
+  contains its network and target settings. This allows the card to configure
+  itself, connect to a target and boot from it before any operating system or a
+  bootloader has the chance. We can also read this information from
+  /sys/firmware/ibft after the system starts and then use it to bring the card
+  up (again) in Linux.
+- 'CHAP' is the authentication used for iSCSI connections. The authentication
+  can happen during target discovery or target login or both. It can happen in
+  both directions too: the initiator authenticates itself to the target and the
+  target is sometimes required to authenticate itself to the initiator.
+
+
+What is expected from Anaconda
+------------------------------
+
+We are expected to:
+
+- use an HBA like an ordinary disk. It is usually smart enough to bring itself
+  up during boot, connect to the target and just act as an ordinary disk.
+- allow creating new software initiator connections in the UI, both IPv4 and IPv6.
+- facilitate bringing up iBFT connections for partial offload cards.
+- install the root and/or /boot filesystems on any iSCSI initiator known to us
+- remember to install dracut-network if we are booting from an iSCSI initiator that
+  requires iscsi-initiator-utils in the ramdisk (most of them do)
+- boot from an iSCSI initiator using dracut, this requires generating an
+  appropriate set of kernel boot arguments for it [3].
+
+
+How Anaconda handles iscsi
+--------------------------
+
+iSCSI comes into play several times while Anaconda does its thing:
+
+In loader, when deciding what NIC we should setup, we check if we have iBFT
+information from one of the cards. If we do we set that card up with what we
+found in the table, it usually boils down to an IPv4 static or IPv4
+DHCP-obtained address. [4][5]
+
+Next, after the main UI startup during filtering (or storage scan, whatever
+comes first) we startup the iscsi support code in Anaconda [6]. This currently
+involves:
+- manually modprobing related kernel modules
+- starting the iscsiuio daemon (required by some partial offload cards)
+- most importantly, starting the iscsid daemon
+
+All iBFT connections are brought up next by looking at the cards' iBFT data, if
+any. The filtering screen has a feature to add advanced storage devices,
+including iSCSI. Both connection types are handled by libiscsi (see below). The
+brought up iSCSI devices appear as /dev/sdX and are treated as ordinary block
+devices.
+
+When DeviceTree scans all the block devices it uses the udev data (particularly
+the ID_BUS and ID_PATH keys) to decide if the device is an iscsi disk. If it is,
+it is represented with an iScsiDiskDevice class instance. This helps Anaconda
+remember that:
+
+- we need to install dracut-network so the generated dracut image is able to
+  bring up the underlying NIC and establish the iscsi connection.
+- if we are booting from the device we need to pass dracut a proper set of
+  arguments that will allow it to do so.
+
+
+Libiscsi
+--------
+
+How are iSCSI targets found and logged into? Originally Anaconda was just
+running iscsiadm as an external program through execWithRedirect(). This
+ultimately proved awkward especially due to the difficulties of handling the
+CHAP passphrases this way. That is why Hans de Goede <hdegoede@xxxxxxxxxx>, the
+previous maintainer of the Anaconda iscsi subsystem decided to write a better
+interface and created libiscsi (do not confuse this with the libiscsi.c in
+kernel). Currently libiscsi lives as a couple of patches in the RHEL6
+iscsi-initiator-utils CVS (and in Fedora package git, in somewhat outdated
+version). Since Anaconda is libiscsi's only client at the moment it is
+maintained by the Anaconda team.
+
+The promise of libiscsi is to provide a simple C/Python API to handle iSCSI
+connections while being somewhat stable and independent of the changes in the
+underlying initiator-utils (while otherwise being tied to it on the
+implementation level).
+
+And at the moment libiscsi does just that. It has a set of functions to discover
+and login to targets software targets. It supports making connections through
+partial offload interfaces, but the only discovery method supported at this
+moment is through firmware (iBFT). Its public data structures are independent of
+iscsi-initiator-utils. And there is some python boilerplate that wraps the core
+functions so we can easily call those from Anaconda.
+
+To start nontrivial hacking on libiscsi prepare to spend some time familiarizing
+yourself with the iscsi-initiator-utils internals (it is complex but quite
+nice).
+
+
+Debugging iSCSI bugs
+--------------------
+
+There is some information in anaconda.log and storage.log but libiscsi itself is
+quite bad at logging. Most times useful information can be found by sshing onto
+the machine and inspecting the output of different iscsiadm commands [2][7],
+especially querying the existing sessions and known interfaces.
+
+If for some reason the DeviceTree fails at recognizing iscsi devices as such,
+'udevadm info --exportdb' is of interest.
+
+The booting problems are either due to incorrectly generated dracut boot
+arguments or they are simply dracut bugs.
+
+Note that many of the iscsi adapters are installed in different Red Hat machines
+and so the issues can often be reproduced and debugged.
+
+
+Future of iSCSI in Anaconda
+---------------------------
+
+- extend libiscsi to allow initializing arbitrary connections from a partial
+  offload card. Implement the Anaconda UI to utilize this. Difficulty hard.
+- extend libiscsi with device binding support. Difficulty hard.
+- work with iscsi-initiator-utils maintainer to get libiscsi.c upstream and then
+  to rawhide Fedora. Then the partial offload patches in the RHEL6 Anaconda can
+  be migrated there too and partial offload can be tested. This is something
+  that needs to be done before RHEL7. Difficulty medium.
+- improve libiscsi's logging capabilities. Difficulty easy.
+
+
+
+[1] http://www.open-iscsi.org/
+[2] /usr/share/doc/iscsi-initiator-utils-6.*/README
+[3] man 7 dracut.kernel
+[4] Anaconda git repository, anaconda/loader/ibft.c
+[5] Anaconda git repository, anaconda/loader/net.c, chooseNetworkInterface()
+[6] Anaconda git repository, anaconda/storage/iscsi.py
+[7] 'man 8 iscsiadm'
+
+
+---
+Red Hat Author(s): Ales Kozumplik <akozumpl@xxxxxxxxxx>
diff --git a/docs/multipath.txt b/docs/multipath.txt
new file mode 100644
index 0000000..e8af24e
--- /dev/null
+++ b/docs/multipath.txt
@@ -0,0 +1,143 @@
+======================
+Multipath and Anaconda
+======================
+
+
+Introduction
+------------
+
+If there are two block devices in your /dev for which udev reports the same
+'ID_SERIAL' then you can create a certain device mapper device which arbitrarily
+uses those devices to access the physical device. And that is Multipath [1].
+
+For instance, suppose there are
+
+/dev/sda, with ID_SERIAL of 20090ef12700001d2, and
+/dev/sdb, with the same ID_SERIAL.
+
+Those are probably some adapters in the system that just connect your box to a
+storage area network (SAN) somewhere. There are perhaps two cables, one for sda,
+one for sdb, and if one of the cables gets cut the other can still transmit
+data. Normally the system won't recognize that sda and sdb have this special
+relation to each other, but by creating a suitable device map using multipath
+tools [2] we can create a DM device /dev/mapper/mpatha and use it for storing
+and retrieving data.
+
+The device mapper then automatically routes IO requests to /dev/mapper/mpatha to
+either sda or sdb depending on the load of the line or network congestion on the
+particular network etc.
+
+The nomenclature I will use here is:
+
+- 'multipath device' for the smart /dev/mapper/mpathX device.
+- 'multipath member device' for the '/dev/sdX' devices. Also 'a path'.
+
+
+What is expected from Anaconda
+------------------------------
+
+Anaconda is expected to:
+- detect that there are multipath devices present
+- coalesce all relevant (e.g. exclusiveDisks) multipath devices.
+- only let the user interact with the multipath devices in filtering,
+  cleardiskssel and partition screen, that is once we know 'sdc' and 'sdd' are
+  part of 'mpathb' show only 'mpathb' and never the paths.
+- install bootloader and boot from an mpath device
+- make it happen so all the multipath devices (carrying or not the root
+  filesystem) we used for installation are correctly coalesced in the booted
+  system. This is achieved by generating a suitable /etc/multipath.conf and
+  writing it into sysroot.
+- be able to refer to mpath devices from kickstart, either by name like 'mpatha'
+  or by their id like 'disk/by-id/scsi-20090ef12700001d2'
+
+
+How Anaconda handles multipath
+------------------------------
+
+To detect presence of multipath devices we rely on multipath tools. The same we
+do for coalescing, see pyanaconda/storage/devicelibs/mpath.py, the file that
+provides some abstraction from mpath tools. During the device scan we use the
+'multipath -d' output to find out what devices are going to end up as multipath
+members. The MultipathTopology object also enhances the multipath member's udev
+dictionaries with 'ID_FS_TYPE' set to 'multipath_member' (yes, this is a hack
+surviving from the original mpath implementation, and righteous is he who
+eradicates it). This information is picked up by DeviceTree when populating
+itself. Meaning, if 'sda' and 'sdb' are multipath member devices DeviceTree
+gives them MultipathMember format and creates one MultipathDevice for them (we
+know its name from 'multipath -d'). We end up with:
+
+DiskDevice 'sda', format 'MultipathMember'
+DiskDevice 'sdb', format 'MultipathMember'
+MultipathDevice 'mpatha', parents are 'sda' and 'sdb'.
+
+From then on, Anaconda only deals with the MultipathDevice and generally leaves
+anything with 'MultipathMember' format alone (understand, this is an inert
+format that really is not there but we use it just to mark the device as
+"useless beyond a multipath member", kind of like MDRaidMember).
+
+Partition happens over the multipath device and during the preinstallconfig step
+/mnt/sysimage/etc/multipath.conf is created and filled with information about
+the coalesced devices. This is handled in the Storage.write() method. It is
+important this file and /etc/multipath/wwids (autogenerated by mpath tools)
+make it to the sysimage before the dracut image is generated.
+
+
+Debugging multipath bugs
+------------------------
+
+Unlike with iSCSI, to reproduce a multipath bug one does not need the same
+specific hardware as the reporter. Just found any box connected to a multipathed
+SAN and you are fine (at the moment, connecting to the same iSCSI target through
+its IPv4 and IPv6 address also produces a multipathed device).
+
+On top of that, much of the necessary information is already included in the
+anaconda logs or can be easily extracted from the reporter. The things to
+particularly look at are:
+
+- storage.log, the output around 'devices to scan for multipath' and 'devices
+  post multipath scan'. The latter shows a triple with regular disks, disks
+  comprising multipath devices and partitions. This helps you quickly find out
+  what the target system is about.
+
+- this information is also in program.log's calls to 'multipath' [3]. If mpath
+  devices are mysteriously appearing/disappearing between filtering and
+  partitioning screens look at those. 'multipath -ll' is called to display
+  currently coalesced mpath devices, 'multipath -d' is called to show the mpath
+  devices that would be coalesced if we ran 'multipath' now. This is exploited
+  by the device filtering screen.
+
+
+Future of multipath in Anaconda
+-------------------------------
+
+Overall as of RHEL6.2, the shape of multipath in Anaconda is good and what's
+more important it is flexible enough to sustain new RFEs and bugs. Those are
+however bugs that I expect to appear sometime soon:
+
+- enable or disable mpath_friendly_names in kickstart. Disabling friendly names
+  just means the mpath devices are called by their wwid,
+  e.g. /dev/mapper/360334332345343234, not '/dev/mapper/mpathc'. This is
+  straightforward to implement.
+- extend support for mpath devices in kickstart in general. Currently mpath
+  devices should be accepted in most commands but I am sure there will be corner
+  cases. Difficulty medium.
+- [rawhide] stop extending the udev info dictionary with 'ID_FS_TYPE' and
+  'ID_MPATH_NAME'. Doing it this way is asking for the trouble if a dictionary
+  of particular mpath device is reloaded from udev without running it through
+  the MultipathTopology object as it will miss those entries (and DeviceTree
+  depends on them a lot). Difficulty hard, but includes a lot of pleasant
+  refactoring.
+- Improve support for multipathing iSCSI devices. Someone might ask for it one
+  day (in fact, with the NIC bounding they already did), and it will make mpath
+  debugging possible on any virt machine with multiple virt NICs.
+
+
+
+[1] http://akozumpl.fedorapeople.org/archive/Multipass.jpg
+[2] http://christophe.varoqui.free.fr/
+[3] 'man 8 multipath'
+
+
+
+---
+Red Hat Author(s): Ales Kozumplik <akozumpl@xxxxxxxxxx>
-- 
1.7.6.4

_______________________________________________
Anaconda-devel-list mailing list
Anaconda-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/anaconda-devel-list


[Index of Archives]     [Kickstart]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]
  Powered by Linux