On Wed, Feb 12, 2020 at 02:29:42PM +0800, Chen Yu wrote: > On Tue, Feb 11, 2020 at 07:50:43AM -0600, Bjorn Helgaas wrote: > > On Tue, Feb 11, 2020 at 01:57:06PM +0800, Chen Yu wrote: > > > Hi, > > > We found two issues in the code during suspend: > > > > > > 1. Andy Shevchenko found that, the save restore of pci config space > > > might cause potential issue. Current code uses > > > pci_read_config_dword() to read pci config header. However > > > hardware is not obliged to react correctly when trying to read > > > two/three 'adjacent' pci config registers with one dword read. > > > > > > Q1: Should we save/restore the pci config space header according > > > to the PCI spec strictly(pci_read_config_dword() for 32bit, > > > while pci_read_config_word() for 16bits, etc)? > > > > I'm sure you know my first question will be for a spec reference > > for this requirement that we read registers with the correct size > > :) If there is such a requirement, then of course we should > > follow it. > > > There seems to be no explicit request in the spec that the config > space header should be read according to each register's size. And > after recheck the PCI Express Base Specification, Rev. 4.0 Version > 1.0 pg703, it mentions: "An implemented 64-bit Base Address register > consumes two consecutive DWORD locations." It looks like content > within a DWORD io space should be adjacent. So > pci_read_config_dword() should be applicable. But this is just my > understanding, since we have not encountered issues caused by dword > reading yet, we might let it be for now. > > > > 2. The pci config space of some problematic devices(or due to firmware > > > bug) might become inaccessible after resumed from S3(suspend to > > > mem) on VM. > > > > > > Q2: Should we do sanity check on pci config space before saving > > > them? Say, invoke pci_dev_is_present() before suspend, if the > > > pci config space is not sane, bypass the config space saving > > > process, because there's no need to save invalid pci config > > > space. > > > > I'm not in favor of a sanity check, at least not yet. This sounds > > like a problem that has not been debugged yet. If the device is > > broken in some way, maybe a quirk would be appropriate. Otherwise, > > maybe there's some Linux issue in the resume from S3 path that we > > should fix. > > Got it. OK. Obviously you're tripping over some problem, but you haven't given any details about what it is, so I don't think there's anything I can help with yet. When you have a potential workaround patch or more details, don't hesitate to post them! Bjorn