On 16.02.2016 18:18, Peter Jones wrote: > On Tue, Feb 16, 2016 at 01:49:18PM +0000, Matt Fleming wrote: >> [ Including Peter, the efifb maintainer. Original email is here, >> >> http://marc.info/?l=linux-kernel&m=145552936131335&w=2 >> >> I've snipped some of the quoted text ] >> >> On Tue, 16 Feb, at 08:55:22AM, Ingo Molnar wrote: >>> >>> (I've Cc:-ed the EFI-FB and FB gents. Mail quoted below.) >>> >>> * Alexander Popov <alpopov@xxxxxxxxxxxxxx> wrote: >>> >>>> Currently the code in fb_is_primary_device() contains to_pci_dev() macro >>>> which is applied to dev from struct fb_info. In some cases this causes >>>> bad memory access when fb_is_primary_device() handles fb_info of efifb. >>>> The reason is that fb dev of efifb is embedded into struct platform_device >>>> but not into struct pci_dev. >>>> >>>> We can fix this by checking fb dev bus name in fb_is_primary_device(). >>>> >>>> It seems that this bug reveals some bigger problem with to_pci_dev(), >>>> to_platform_device() and others, which just do container_of() and >>>> don't check whether struct device is a part of the appropriate structure. >>>> Should we do something more about it? >>>> >>>> KASan report: >> >> [...] >> >>>> >>>> Signed-off-by: Alexander Popov <alpopov@xxxxxxxxxxxxxx> >>>> --- >>>> arch/x86/video/fbdev.c | 9 +++++---- >>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c >>>> index d5644bb..4999f78 100644 >>>> --- a/arch/x86/video/fbdev.c >>>> +++ b/arch/x86/video/fbdev.c >>>> @@ -18,11 +18,12 @@ int fb_is_primary_device(struct fb_info *info) >>>> struct pci_dev *default_device = vga_default_device(); >>>> struct resource *res = NULL; >>>> >>>> - if (device) >>>> - pci_dev = to_pci_dev(device); >>>> - >>>> - if (!pci_dev) >>>> + if (!device || !device->bus || >>>> + !device->bus->name || strcmp(device->bus->name, "pci")) { >>>> return 0; >>>> + } >>>> + >>>> + pci_dev = to_pci_dev(device); >>>> >>>> if (default_device) { >>>> if (pci_dev == default_device) >>>> -- >>>> 1.9.1 >>>> >> >> I wonder if this issue could explain some of the efifb issues we've >> seen reported on bugzilla.kernel.org in the past where switching from >> efifb to some other framebuffer device caused hangs during boot. I'm >> struggling to find the relevant bugzilla entries now, though. > > It's possible it could, but I don't have them handy either. I've also > wondered if some of them were due to bad data from the firmware - at > plugfests we've seen some cases where the actual video mode as measured > with a ruler is clearly not what the firmware claims it to be, so it's > entirely possible we're occasionally told a memory region that is not > what's actually mapped, or that's mapped but is only partially backed > by the actual frame buffer memory. > > But aside from that diversion, I think Alexander has a legitimate > question about use of to_pci_dev(). If I ask the question: can we fix > this in efifb by making it live on a pci_dev, I have a couple of > fundamental problems: > > 1) technically it doesn't have to be a pci_dev at all (but, practically, > so far it always is on PCI...) > 2) From EFI, we can't necessarily pin it down to a single PCI device > even if it is PCI. Before we do EFI's ExitBootServices() call, we > can try to find the PCI_IO handle our GOP instance is connected to, > but not all firmware GOP drivers use that, so it doesn't always work. > And even if it did, there can be more than one instance pointing to > the same memory with different PCI devices - lots of laptops have > this sort of thing. > 3) Ignoring the EFI side and just focusing on PCI, if there's two > devices configured that could do scanout, it can be mapped to one > device's BAR but the other device be the actual device using it. In > this case either choice is probably wrong for something, and the > things that have the information to resolve which one don't include > efifb - they're the drivers we'll likely hand off to later. > > So it's most likely right for efifb to be embedded in a platform_device > instead of a pci_dev. Which leads back to Alexander's question - if it > isn't in a pci_dev, that means fb_is_primary_device() needs to not > assume it is. So the patch appears correct, but so is the question - > should to_pci_dev() be checking this and returning NULL here? The discussion has suspended. May I activate it again? So there are two ways to fix the bad memory access in fb_is_primary_device(). The first one is proposed in my patch. Checking the bus name string doesn't look good but I didn't manage to come up with anything better. The second way is changing to_pci_dev() similarly. It may return NULL or call BUG() when struct device is a part of an inappropriate structure. Which way is better? Do we need to do anything with other similar macros? Thanks. Best regards, Alexander -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html