Re: mlx4_core probe error after applying Yinghai's patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Jan 2014 17:15:25 +0800
Wei Yang <weiyang@xxxxxxxxxxxxxxxxxx> wrote:

> >> Error log:
> >>   mlx4_core 0003:05:00.0: Multiple PFs not yet supported.  Skipping
> >> PF. mlx4_core: probe of 0003:05:00.0 failed with error -22  
> >
> >1. Please include the full error log starting from host bootup and
> >driver start  
> 
> Will provide one later. 

>From procedure __mlx4_init_one() in file
drivers/net/ethernet/mellanox/mlx4_main.c:

	        /* We reset the device and enable SRIOV only for physical
                 * devices.  Try to claim ownership on the device;
                 * if already taken, skip -- do not allow multiple PFs
                 */
==>		err = mlx4_get_ownership(dev);
                if (err) {
                        if (err < 0)
                                goto err_free_dev;
                        else {
                                mlx4_warn(dev, "Multiple PFs not yet
                supported." " Skipping PF.\n");
                                err = -EINVAL;
                                goto err_free_dev;
                        }
                }
===================
#define MLX4_OWNER_BASE 0x8069c
#define MLX4_OWNER_SIZE 4

static int mlx4_get_ownership(struct mlx4_dev *dev)
{
        void __iomem *owner;
        u32 ret;

        if (pci_channel_offline(dev->pdev))
                return -EIO;

==>     owner = ioremap(pci_resource_start(dev->pdev, 0) + MLX4_OWNER_BASE,
                        MLX4_OWNER_SIZE);
        if (!owner) {
                mlx4_err(dev, "Failed to obtain ownership bit\n");
                return -ENOMEM;
        }

        ret = readl(owner);
        iounmap(owner);
==>     return (int) !!ret;
}


I suspect one of the following scenarios:

1. BAR 0 contains a "PPF Selection" (i.e., ownership) semaphore:  The first PF probe which
reads the semaphore "acquires" it (i.e., the first read grabs the semaphore (the read returns zero).
Subsequent reads return non-zero.  When the PF driver is unloaded, it calls "mlx4_free_ownership()",
which writes a zero into the semaphore dword, so that the next read will return zero.

In this scenario, initialization of the "PPF selection" semaphore to zero has been compromised somehow, so that
even the first read attempt returns a non-zero value.  In this scenario, note that the ioremap DID succeed, or
we would see the "Failed to obtain ownership bit" message in the error log.  Maybe pre-fetching has something
to do with this? (i.e., maybe if the BAR is not prefetched, the initial value of the semaphore is compromised).

2. For some reason the same PF is being probed twice by the
kernel.  In this case the second probe attempt fails because the PF has
already been probed once.

This is the reason that I want to see the entire log -- to see if
indeed the device is being "double-probed"

-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux