>> +/* >> + * To reduce the system bootup time, the HBM training has >> + * been moved out of the UEFI on the Grace-Blackwell systems. >> + * >> + * The onus of checking whether the HBM training has completed >> + * thus falls on the module. The HBM training status can be >> + * determined from a BAR0 register. >> + * >> + * Similarly, another BAR0 register exposes the status of the >> + * CPU-GPU chip-to-chip (C2C) cache coherent interconnect. >> + * >> + * Poll these register and check for 30s. If the HBM training is >> + * not complete or if the C2C link is not ready, fail the probe. >> + * >> + * While the wait is not required on Grace Hopper systems, it >> + * is beneficial to make the check to ensure the device is in an >> + * expected state. >> + */ >> +static int nvgrace_gpu_wait_device_ready(struct pci_dev *pdev) >> +{ >> + unsigned long timeout = jiffies + msecs_to_jiffies(POLL_TIMEOUT_MS); >> + void __iomem *io; >> + int ret = -ETIME; >> + >> + io = pci_iomap(pdev, 0, 0); >> + if (!io) >> + return -ENOMEM; >> + >> + do { >> + if ((ioread32(io + C2C_LINK_BAR0_OFFSET) == STATUS_READY) && >> + (ioread32(io + HBM_TRAINING_BAR0_OFFSET) == STATUS_READY)) { >> + ret = 0; >> + goto reg_check_exit; >> + } >> + msleep(POLL_QUANTUM_MS); >> + } while (!time_after(jiffies, timeout)); >> + >> +reg_check_exit: >> + pci_iounmap(pdev, io); >> + return ret; > > We're accessing device memory here but afaict the memory enable bit of > the command register is in an indeterminate state. What happens if you > use setpci to clear the memory enable bit or 'echo 0 > enable' before > binding the driver? Thanks, > > Alex Hi Alex, sorry I didn't understand how we are accessing device memory here if the C2C_LINK_BAR0_OFFSET and HBM_TRAINING_BAR0_OFFSET are BAR0 regs. But anyways, I tried 'echo 0 > <sysfs_path>/enable' before device bind. I am not observing any issue and the bind goes through. Or am I missing something?