Powered by Linux
Re: Adding PCI hotplug safe checking to smatch — Semantic Matching Tool

Re: Adding PCI hotplug safe checking to smatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+ cc: gregkh and bottom post]

On Thu, 23 Jan 2014 00:53:35 +0300
Dan Carpenter <dan.carpenter@xxxxxxxxxx> wrote:

> Ok.  I've added the smatch mailing list.  And I'm top posting because
> I am a bad person.
> 
> It would be better if I could look at an existing patch for one of these
> things.
> 
> regards,
> dan carpenter
> 
> On Wed, Jan 22, 2014 at 01:28:29PM -0500, Joe Lawrence wrote:
> > Hi Dan,
> > 
> > (Feel free to CC the smatch list if you think it's appropriate.)
> > 
> > Part of the work we do at Stratus is hardening various device drivers
> > for our HW platform against surprise PCI hotplug removal.  Even though
> > it has been documented in Linux Device Drivers for years, we still see
> > GregKH occasionally explaining to folks that on surprise hotplug, PCI
> > drivers "will suddenly start reading all 0xff and will then need to
> > abort whatever it was doing."
> > 
> > For example, the code provided in section 14.7.1. Dynamic Devices of
> > LDD:
> > 
> >   result = readl(ptr);
> >   if (result =  = ~(u32)0)    /* card removed */
> >       return -ENODEV;
> > 
> > Currently we manually inspect code drops for unsafe reads of PCI
> > space.  We have essentially two cases of PCI reads:
> > 
> > Case 1 - Read data doesn't matter - driver reads from the device, then
> > writes something back.  Although the device has gone missing, we
> > usually consider this as benign as writing to missing HW has no effect. 
> > 
> > Case 2 - Read data affects control flow or kernel/driver data - driver
> > reads data and then bases execution on the result, or read data is
> > copied into a kernel/driver data structure.  Typically we check these
> > reads and if necessary, twiddle enough state to get the driver into its
> > error handler.
> > 
> > Currently we manually inspect new code drops for potentially
> > hotplug-unsafe reads.  I was wondering if you thought it would be worth
> > adding a check in smatch, even as an optional flag.  There is probably
> > already an existing pattern that we could mimic to add this.  As I
> > don't hack on the smatch code itself, I was wondering if you could
> > point that out.  (I can go off and try implementing / testing.)
> > 
> > A means of automating this code audit would be helpful to us here at
> > Stratus and perhaps the kernel community in general.
> > 
> > Thanks for any suggestions,
> > 
> > -- Joe


Hi Dan,

First, I should have mentioned that Stratus inspects PCI reads for two
reasons:

  1 - Handle hotplug removed devices, ensure data/program flow
      integrity.  (As mentioned in my first mail.)

  2 - Find missing devices as soon as possible so that we can failover
      to a devices' pair to provide availability.  (Think of ethernet
      interface bonding and MD RAID1 pairing of HBAs.)

To ensure reason (2), Stratus adds "hw_gone" flags and other checks to
expedite device removal, but I want to set that aside to keep it simple.
With that mind, our changes probably go farther than what the linux
kernel community at large would be interested in.  However, it may
still be beneficial for driver writers to have a static code analyzer
looking over their shoulder for reason (1).

In the ixgbe driver, we verify a reasonable return value from readl like
so (in pseudo-diff format):

#define IXGBE_READ_REG(a, reg) readl((a)->hw_addr + (reg))
#define IXGBE_STATUS        0x00008	  /* General Register */
#define IXGBE_EEC_FLUDONE   0x04000000    /* Flash update done */

static s32 ixgbe_poll_flash_update_done_X540(struct ixgbe_hw *hw)
{
	u32 i;
	u32 reg;
	s32 status = IXGBE_ERR_EEPROM;

	for (i = 0; i < IXGBE_FLUDONE_ATTEMPTS; i++) {
		reg = IXGBE_READ_REG(hw, IXGBE_EEC);
+		if (unlikely(reg == 0xffffffff) &&
+		    IXGBE_READ_REG(hw, IXGBE_STATUS) == 0xffffffff)
+			break;
		if (reg & IXGBE_EEC_FLUDONE) {
			status = 0;
			break;
		}
		udelay(5);
	}
	return status;
}

Without checking for a missing device read value of 0xffffffff, the
logic could follow the (reg & IXGBE_EEC_FLUDONE) branch and
ixgbe_poll_flash_update_done_X540's caller would assume success.

For some reads, a value of ~0 may be valid, so we couple the check with
a read of something like a general status register (where ~0 really
means no device) to be sure.

On the other hand, there are PCI reads that Stratus doesn't verify, for
their side effect doesn't affect reason (1) or reason (2).  Again in the
ixgbe driver:

static s32 ixgbe_blink_led_start_X540(struct ixgbe_hw *hw, u32 index)
{
        u32 macc_reg;
        u32 ledctl_reg;
        ixgbe_link_speed speed;
        bool link_up;

        /*
         * Link should be up in order for the blink bit in the LED control
         * register to work. Force link and speed in the MAC if link is down.
         * This will be reversed when we stop the blinking.
         */
        hw->mac.ops.check_link(hw, &speed, &link_up, false);
        if (!link_up) {
                macc_reg = IXGBE_READ_REG(hw, IXGBE_MACC);
                macc_reg |= IXGBE_MACC_FLU | IXGBE_MACC_FSV_10G | IXGBE_MACC_FS;
                IXGBE_WRITE_REG(hw, IXGBE_MACC, macc_reg);
        }
        /* Set the LED to LINK_UP + BLINK. */
        ledctl_reg = IXGBE_READ_REG(hw, IXGBE_LEDCTL);
        ledctl_reg &= ~IXGBE_LED_MODE_MASK(index);
        ledctl_reg |= IXGBE_LED_BLINK(index);
        IXGBE_WRITE_REG(hw, IXGBE_LEDCTL, ledctl_reg);
        IXGBE_WRITE_FLUSH(hw);

        return 0;
}

The reads into macc_reg and ledctl_reg we don't verify since they are
only used to calculate new values written back to the same device.  If
this device was hotplug removed, the writes would go nowhere (this is
okay).  Some other read or a device watchdog would catch the missing
device and handle accordingly.

In our modified drivers, Stratus annotates every "checked" read and
every "don't care" read.  The "checked" reads will try to as gracefully
as possible invoke an error status/handler to escape its call chain
without causing any collateral damage.  The "don't care" reads are
marked so that we know it's been audited.

While giving smatch a spin on our drivers, it occurred to me that it
might be employed to help us catch any new hotplug-unsafe PCI reads that
we may have missed.  Depending upon how aggressive the check is, it may
be useful for general upstream use as well.

Does this sound like something suitable for smatch?  Maybe there is an
existing pattern that we could base this on.

Regards,

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe smatch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux