Re: Repeated setup / teardown of dmraid arrays as done in master has issues

Hans de Goede <hdegoede@xxxxxxxxxx> · Thu, 08 Oct 2009 17:17:37 +0200

Hi,

On 10/08/2009 05:07 PM, David Lehman wrote:
On Thu, 2009-10-08 at 12:31 +0200, Hans de Goede wrote:
Hi,

The master branch has this piece of code, causing dmraid arrays to potentially
be brought down and then up again when executing actions:

def processActions(self, dryRun=None):

<snip>

          for action in self._actions:
              log.info("executing action: %s" % action)
              if not dryRun:
                  try:
                      action.execute(intf=self.intf)
                  except DiskLabelCommitError:
                      # it's likely that a previous format destroy action
                      # triggered setup of an lvm or md device.
                      self.teardownAll()
                      action.execute(intf=self.intf)

                  udev_settle(timeout=10)

Right, but it only happens if a DiskLabelCommitError occurs -- it's not
as though we do it all the time.

This is a problem as pyblock does the following when bringing up a set:

1) Add a device map for the set, remember the set map
2) Read partition table from the map for the set
3) Create device maps for the partitions, remember them

And then when bringing down a set:

1) Remove device maps for the partitions as remembered
2) Remove device map for the set

Now when the set gets brought down after some actions have
been executed, 1) from the pyblock teardown may fail,
as the partition table may have been modified and
parted's commit_to_os will update the partition device
maps when this happens. This means some of the partition
maps pyblock tries to remove may no longer be there, causing
a backtrace. Or there might be more maps then it has remembered,
and then 2) will fail as the set map is still busy

There are 2 solutions to this:

1) Stop the ping pong up down of BIOS RAID sets, there is no need
for this, we dont go unload / reload sd_mod either (I know it aint
modular anymore). BIOS RAID sets are really just plain disks,
and we already acknowledge this by hiding the real disks and only
showing / using the set.

Please lets just stop this ping pong. While testing dmraid with F-12
yesterday for the first time in a while I found and fixed 3 separate
bugs all triggered by this ping ponging. 2 of which where dmraid
specific and would have not been an issue if not for this ping ponging.

"bugs all triggered by this ping ponging" -- remember that the "ping
ponging" is triggered by a failure to commit the new disklabel. It's not
as though we're doing this for the hell of it.

Well we sort of are, the devicetree populate ends with a teardown all,
then the first device use will bring it up again, and if we are doing
initlabel, then a DestroyformatAction gets scheduled on the disk
wich does setup, zero out, teardown, there is your second ping pong,
these all do not hit the above problem because no partition table changes
are happening in between, but these are actually the ping pongs causing 2 of
the 3 bugs I hit yesterday.

In a way this ping pong-ing is good as it is shaking out real bugs left and
right, but at the same it is the trigger for a lot of issues, without it
we will simply never hit these issues.

Should dmraid and mpath use something other than setup/teardown so that
they aren't being turned on/off repeatedly during install? I can't think
of a reason for us to turn them off/on, so maybe we just activate them
when we find them and leave them on until the very end?

Yes that is what I am proposing atleast for dmraid, renaming their
setup and teardown sounds like a good plan (and checking the inherited
onces work, or otherwise add new ones)

Regards,

Hans

_______________________________________________
Anaconda-devel-list mailing list
Anaconda-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/anaconda-devel-list