On Fri, Nov 08, 2019 at 12:14:05PM -0800, Lucas De Marchi wrote: > On Fri, Nov 08, 2019 at 09:19:15PM +0200, Ville Syrjälä wrote: > >On Fri, Nov 08, 2019 at 10:18:52AM -0800, Lucas De Marchi wrote: > >> On Fri, Nov 08, 2019 at 01:14:03PM +0200, Jani Nikula wrote: > >> >On Thu, 07 Nov 2019, Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote: > >> >> When we are mapping the VBT through pci_map_rom() we may not be allowed > >> >> to simply discard the address space and go on reading the memory. After > >> >> checking on my test system that dumping the rom via sysfs I could > >> >> actually get the correct vbt, I decided to change the implementation to > >> >> use the same approach, by calling memcpy_fromio(). > >> >> > >> >> In order to avoid copying the entire oprom this implements a simple > >> >> memmem() searching for "$VBT". Contrary to the previous implementation > >> >> this also takes care of not issuing unaligned PCI reads that would > >> >> otherwise get translated into more even more reads. I also vaguely > >> >> remember unaligned reads failing in the past with some devices. > >> >> > >> >> Also make sure we copy only the VBT and not the entire oprom that is > >> >> usually much larger. > >> > > >> >So you have > >> > > >> >1. a fix to unaligned reads > >> > >> unaligned io reads, yes > >> > >> > > >> >2. an optimization to avoid reading individual bytes four times > >> > >> it was by no means an optimization. Not reading the same byte 4 bytes is > >> there actually to stop doing the unaligned IO reads. You can't have (2) > >> without (1) unless you switch to ioreadb() and add a shift (which may > >> not be a bad idea. > >> > >> > > >> >3. respecting __iomem and copying (I guess these are tied together) > >> > > >> >Seems to me that really should be at least three patches. Not > >> >necessarily in the above order. > >> > >> (3) is actually the most important I think, so I will start by that. > >> > >> > > >> >Follow-up: store pointer to the oprom vbt somewhere under i915->vbt, and > >> >have debugfs i915_vbt() handle that properly. > >> > >> I don't think this is needed. The thing I'm doing here is the same as > >> what can be accomplished by reading the rom from sysfs: > >> > >> find /sys/bus/pci/devices/*/ -name rom > >> ... choose one > >> > >> echo 1 > rom # to allow reading the rom > >> hexdump -C rom > >> > >> > >> > > >> >> > >> >> Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> > >> >> --- > >> >> drivers/gpu/drm/i915/display/intel_bios.c | 95 +++++++++++++++++++---- > >> >> 1 file changed, 79 insertions(+), 16 deletions(-) > >> >> > >> >> diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c > >> >> index 671bbce6ba5b..c401e90b7cf1 100644 > >> >> --- a/drivers/gpu/drm/i915/display/intel_bios.c > >> >> +++ b/drivers/gpu/drm/i915/display/intel_bios.c > >> >> @@ -1806,31 +1806,88 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size) > >> >> return vbt; > >> >> } > >> >> > >> >> -static const struct vbt_header *find_vbt(void __iomem *oprom, size_t size) > >> >> +void __iomem *find_vbt(void __iomem *oprom, size_t size) > >> >> { > >> >> - size_t i; > >> >> + const u32 MAGIC = *((const u32 *)"$VBT"); > >> >> + size_t done = 0, cur = 0; > >> >> + void __iomem *p; > >> >> + u8 buf[128]; > >> >> + u32 val; > >> >> > >> >> - /* Scour memory looking for the VBT signature. */ > >> >> - for (i = 0; i + 4 < size; i++) { > >> >> - void *vbt; > >> >> + /* > >> >> + * poor's man memmem() with sizeof(buf) window to avoid frequent > >> >> + * wrap-arounds and using u32 for comparison. This gives us 4 > >> >> + * comparisons per ioread32() and avoids unaligned io reads (although it > >> >> + * still does unaligned cpu access). > >> >> + */ > >> > > >> >If we're really worried about performance here, and use a local buffer > >> >to optimize the wraparounds, would it actually be more efficient to use > >> >memcpy_fromio() which has an arch specific implementation in asm? > >> > >> Not really worried about performance. I actually did 3 implementations > >> that avoids the unaligned io reads. > >> > >> 1) this one > >> 2) memcpy_fromio() to the local buffer + strnstr() > >> 3) allocate a oprom buffer, memcpy_fromio() the entire rom and keep a > >> pointer to it. Then free the oprom after the vbt is used > >> > >> (2) and (1) had basically the same complexity involved of requiring a > >> wrap around local buffer, so I went with (1) > >> > >> I didn't feel confortable with (3) because it would allocate much more > >> memory than really needed. > >> > >> > > >> >In any case makes you think you should first have the patch that the > >> >patch subject claims, fix unaligned reads and add optimizations > >> >next. This one does too much. > >> > >> Again, it was not really meant to be an optimization. > >> > >> Ville told me that we may not really need to deal with the unaligned > >> access and change the implementation to expect the VBT to be aligned. > >> This I would be the simplest way to change it, but I'm not fond on > >> changing this and breaking old systems usin it... anyway, we can give it > >> a try and revert if it breaks. > > > >The current code already assumes 4 byte alignment. So nothing would > >change and so nothing can get broken. > > the code for reading the oprom via pci is not assuming 4-byte alignment. > See above, it's doing > > for (i = 0; i + 4 < size; i++) > > It might as well be using a ioread8() + a shift and it would be the > same, since it only advances 1 byte per loop. Hmm, indeed you are correct. I guess the +4 tricked me into thinking otherwise. > > Saying that from acpi it needs to be aligned so the oprom should be > aligned IMO is not valid because the pci method was there before the > acpi one. I don't know exactly what I may be breaking if I switch to > 4-byte alignment. > > Anyway, my new patch splits the changes, as requested by Jani. > Just enforcing we don't ignore the address space already fixes my > issue. So maybe we can leave the alignment alone and not touch it. I suppose we can stick to the i++ if the code can be kept simple enough. But I'm totally willing to put my name on a i+=4 patch if the complexity starts to rise alarmingly. -- Ville Syrjälä Intel _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx