On Fri, Nov 08, 2019 at 10:18:52AM -0800, Lucas De Marchi wrote: > On Fri, Nov 08, 2019 at 01:14:03PM +0200, Jani Nikula wrote: > >On Thu, 07 Nov 2019, Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote: > >> When we are mapping the VBT through pci_map_rom() we may not be allowed > >> to simply discard the address space and go on reading the memory. After > >> checking on my test system that dumping the rom via sysfs I could > >> actually get the correct vbt, I decided to change the implementation to > >> use the same approach, by calling memcpy_fromio(). > >> > >> In order to avoid copying the entire oprom this implements a simple > >> memmem() searching for "$VBT". Contrary to the previous implementation > >> this also takes care of not issuing unaligned PCI reads that would > >> otherwise get translated into more even more reads. I also vaguely > >> remember unaligned reads failing in the past with some devices. > >> > >> Also make sure we copy only the VBT and not the entire oprom that is > >> usually much larger. > > > >So you have > > > >1. a fix to unaligned reads > > unaligned io reads, yes > > > > >2. an optimization to avoid reading individual bytes four times > > it was by no means an optimization. Not reading the same byte 4 bytes is > there actually to stop doing the unaligned IO reads. You can't have (2) > without (1) unless you switch to ioreadb() and add a shift (which may > not be a bad idea. > > > > >3. respecting __iomem and copying (I guess these are tied together) > > > >Seems to me that really should be at least three patches. Not > >necessarily in the above order. > > (3) is actually the most important I think, so I will start by that. > > > > >Follow-up: store pointer to the oprom vbt somewhere under i915->vbt, and > >have debugfs i915_vbt() handle that properly. > > I don't think this is needed. The thing I'm doing here is the same as > what can be accomplished by reading the rom from sysfs: > > find /sys/bus/pci/devices/*/ -name rom > ... choose one > > echo 1 > rom # to allow reading the rom > hexdump -C rom > > > > > >> > >> Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/display/intel_bios.c | 95 +++++++++++++++++++---- > >> 1 file changed, 79 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c > >> index 671bbce6ba5b..c401e90b7cf1 100644 > >> --- a/drivers/gpu/drm/i915/display/intel_bios.c > >> +++ b/drivers/gpu/drm/i915/display/intel_bios.c > >> @@ -1806,31 +1806,88 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size) > >> return vbt; > >> } > >> > >> -static const struct vbt_header *find_vbt(void __iomem *oprom, size_t size) > >> +void __iomem *find_vbt(void __iomem *oprom, size_t size) > >> { > >> - size_t i; > >> + const u32 MAGIC = *((const u32 *)"$VBT"); > >> + size_t done = 0, cur = 0; > >> + void __iomem *p; > >> + u8 buf[128]; > >> + u32 val; > >> > >> - /* Scour memory looking for the VBT signature. */ > >> - for (i = 0; i + 4 < size; i++) { > >> - void *vbt; > >> + /* > >> + * poor's man memmem() with sizeof(buf) window to avoid frequent > >> + * wrap-arounds and using u32 for comparison. This gives us 4 > >> + * comparisons per ioread32() and avoids unaligned io reads (although it > >> + * still does unaligned cpu access). > >> + */ > > > >If we're really worried about performance here, and use a local buffer > >to optimize the wraparounds, would it actually be more efficient to use > >memcpy_fromio() which has an arch specific implementation in asm? > > Not really worried about performance. I actually did 3 implementations > that avoids the unaligned io reads. > > 1) this one > 2) memcpy_fromio() to the local buffer + strnstr() > 3) allocate a oprom buffer, memcpy_fromio() the entire rom and keep a > pointer to it. Then free the oprom after the vbt is used > > (2) and (1) had basically the same complexity involved of requiring a > wrap around local buffer, so I went with (1) > > I didn't feel confortable with (3) because it would allocate much more > memory than really needed. > > > > >In any case makes you think you should first have the patch that the > >patch subject claims, fix unaligned reads and add optimizations > >next. This one does too much. > > Again, it was not really meant to be an optimization. > > Ville told me that we may not really need to deal with the unaligned > access and change the implementation to expect the VBT to be aligned. > This I would be the simplest way to change it, but I'm not fond on > changing this and breaking old systems usin it... anyway, we can give it > a try and revert if it breaks. The current code already assumes 4 byte alignment. So nothing would change and so nothing can get broken. -- Ville Syrjälä Intel _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx