Re: [PATCH 4/4] drm/i915/bios: do not discard address space

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Fri, 8 Nov 2019 21:19:15 +0200

On Fri, Nov 08, 2019 at 10:18:52AM -0800, Lucas De Marchi wrote:
> On Fri, Nov 08, 2019 at 01:14:03PM +0200, Jani Nikula wrote:
> >On Thu, 07 Nov 2019, Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote:
> >> When we are mapping the VBT through pci_map_rom() we may not be allowed
> >> to simply discard the address space and go on reading the memory. After
> >> checking on my test system that dumping the rom via sysfs I could
> >> actually get the correct vbt, I decided to change the implementation to
> >> use the same approach, by calling memcpy_fromio().
> >>
> >> In order to avoid copying the entire oprom this implements a simple
> >> memmem() searching for "$VBT". Contrary to the previous implementation
> >> this also takes care of not issuing unaligned PCI reads that would
> >> otherwise get translated into more even more reads. I also vaguely
> >> remember unaligned reads failing in the past with some devices.
> >>
> >> Also make sure we copy only the VBT and not the entire oprom that is
> >> usually much larger.
> >
> >So you have
> >
> >1. a fix to unaligned reads
> 
> unaligned io reads, yes
> 
> >
> >2. an optimization to avoid reading individual bytes four times
> 
> it was by no means an optimization. Not reading the same byte 4 bytes is
> there actually to stop doing the unaligned IO reads. You can't have (2)
> without (1) unless you switch to ioreadb() and add a shift (which may
> not be a bad idea.
> 
> >
> >3. respecting __iomem and copying (I guess these are tied together)
> >
> >Seems to me that really should be at least three patches. Not
> >necessarily in the above order.
> 
> (3) is actually the most important I think, so I will start by that.
> 
> >
> >Follow-up: store pointer to the oprom vbt somewhere under i915->vbt, and
> >have debugfs i915_vbt() handle that properly.
> 
> I don't think this is needed. The thing I'm doing here is the same as
> what can be accomplished by reading the rom from sysfs:
> 
> find /sys/bus/pci/devices/*/ -name rom
> ... choose one
> 
> echo 1 > rom # to allow reading the rom
> hexdump -C rom
> 
> 
> >
> >>
> >> Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
> >> ---
> >>  drivers/gpu/drm/i915/display/intel_bios.c | 95 +++++++++++++++++++----
> >>  1 file changed, 79 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c
> >> index 671bbce6ba5b..c401e90b7cf1 100644
> >> --- a/drivers/gpu/drm/i915/display/intel_bios.c
> >> +++ b/drivers/gpu/drm/i915/display/intel_bios.c
> >> @@ -1806,31 +1806,88 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)
> >>  	return vbt;
> >>  }
> >>
> >> -static const struct vbt_header *find_vbt(void __iomem *oprom, size_t size)
> >> +void __iomem *find_vbt(void __iomem *oprom, size_t size)
> >>  {
> >> -	size_t i;
> >> +	const u32 MAGIC = *((const u32 *)"$VBT");
> >> +	size_t done = 0, cur = 0;
> >> +	void __iomem *p;
> >> +	u8 buf[128];
> >> +	u32 val;
> >>
> >> -	/* Scour memory looking for the VBT signature. */
> >> -	for (i = 0; i + 4 < size; i++) {
> >> -		void *vbt;
> >> +	/*
> >> +	 * poor's man memmem() with sizeof(buf) window to avoid frequent
> >> +	 * wrap-arounds and using u32 for comparison. This gives us 4
> >> +	 * comparisons per ioread32() and avoids unaligned io reads (although it
> >> +	 * still does unaligned cpu access).
> >> +	 */
> >
> >If we're really worried about performance here, and use a local buffer
> >to optimize the wraparounds, would it actually be more efficient to use
> >memcpy_fromio() which has an arch specific implementation in asm?
> 
> Not really worried about performance. I actually did 3 implementations
> that avoids the unaligned io reads.
> 
> 1) this one
> 2) memcpy_fromio() to the local buffer + strnstr()
> 3) allocate a oprom buffer, memcpy_fromio() the entire rom and keep a
> pointer to it. Then free the oprom after the vbt is used
> 
> (2) and (1) had basically the same complexity involved of requiring a
> wrap around local buffer, so I went with (1)
> 
> I didn't feel confortable with (3) because it would allocate much more
> memory than really needed.
> 
> >
> >In any case makes you think you should first have the patch that the
> >patch subject claims, fix unaligned reads and add optimizations
> >next. This one does too much.
> 
> Again, it was not really meant to be an optimization.
> 
> Ville told me that we may not really need to deal with the unaligned
> access and change the implementation to expect the VBT to be aligned.
> This I would be the simplest way to change it, but I'm not fond on
> changing this and breaking old systems usin it... anyway, we can give it
> a try and revert if it breaks.

The current code already assumes 4 byte alignment. So nothing would
change and so nothing can get broken.

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx