Re: [PATCH 4/4] drm/i915/bios: do not discard address space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 08, 2019 at 11:02:46PM +0200, Ville Syrjälä wrote:
On Fri, Nov 08, 2019 at 12:14:05PM -0800, Lucas De Marchi wrote:
On Fri, Nov 08, 2019 at 09:19:15PM +0200, Ville Syrjälä wrote:
>On Fri, Nov 08, 2019 at 10:18:52AM -0800, Lucas De Marchi wrote:
>> On Fri, Nov 08, 2019 at 01:14:03PM +0200, Jani Nikula wrote:
>> >On Thu, 07 Nov 2019, Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote:
>> >> When we are mapping the VBT through pci_map_rom() we may not be allowed
>> >> to simply discard the address space and go on reading the memory. After
>> >> checking on my test system that dumping the rom via sysfs I could
>> >> actually get the correct vbt, I decided to change the implementation to
>> >> use the same approach, by calling memcpy_fromio().
>> >>
>> >> In order to avoid copying the entire oprom this implements a simple
>> >> memmem() searching for "$VBT". Contrary to the previous implementation
>> >> this also takes care of not issuing unaligned PCI reads that would
>> >> otherwise get translated into more even more reads. I also vaguely
>> >> remember unaligned reads failing in the past with some devices.
>> >>
>> >> Also make sure we copy only the VBT and not the entire oprom that is
>> >> usually much larger.
>> >
>> >So you have
>> >
>> >1. a fix to unaligned reads
>>
>> unaligned io reads, yes
>>
>> >
>> >2. an optimization to avoid reading individual bytes four times
>>
>> it was by no means an optimization. Not reading the same byte 4 bytes is
>> there actually to stop doing the unaligned IO reads. You can't have (2)
>> without (1) unless you switch to ioreadb() and add a shift (which may
>> not be a bad idea.
>>
>> >
>> >3. respecting __iomem and copying (I guess these are tied together)
>> >
>> >Seems to me that really should be at least three patches. Not
>> >necessarily in the above order.
>>
>> (3) is actually the most important I think, so I will start by that.
>>
>> >
>> >Follow-up: store pointer to the oprom vbt somewhere under i915->vbt, and
>> >have debugfs i915_vbt() handle that properly.
>>
>> I don't think this is needed. The thing I'm doing here is the same as
>> what can be accomplished by reading the rom from sysfs:
>>
>> find /sys/bus/pci/devices/*/ -name rom
>> ... choose one
>>
>> echo 1 > rom # to allow reading the rom
>> hexdump -C rom
>>
>>
>> >
>> >>
>> >> Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>> >> ---
>> >>  drivers/gpu/drm/i915/display/intel_bios.c | 95 +++++++++++++++++++----
>> >>  1 file changed, 79 insertions(+), 16 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c
>> >> index 671bbce6ba5b..c401e90b7cf1 100644
>> >> --- a/drivers/gpu/drm/i915/display/intel_bios.c
>> >> +++ b/drivers/gpu/drm/i915/display/intel_bios.c
>> >> @@ -1806,31 +1806,88 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)
>> >>  	return vbt;
>> >>  }
>> >>
>> >> -static const struct vbt_header *find_vbt(void __iomem *oprom, size_t size)
>> >> +void __iomem *find_vbt(void __iomem *oprom, size_t size)
>> >>  {
>> >> -	size_t i;
>> >> +	const u32 MAGIC = *((const u32 *)"$VBT");
>> >> +	size_t done = 0, cur = 0;
>> >> +	void __iomem *p;
>> >> +	u8 buf[128];
>> >> +	u32 val;
>> >>
>> >> -	/* Scour memory looking for the VBT signature. */
>> >> -	for (i = 0; i + 4 < size; i++) {
>> >> -		void *vbt;
>> >> +	/*
>> >> +	 * poor's man memmem() with sizeof(buf) window to avoid frequent
>> >> +	 * wrap-arounds and using u32 for comparison. This gives us 4
>> >> +	 * comparisons per ioread32() and avoids unaligned io reads (although it
>> >> +	 * still does unaligned cpu access).
>> >> +	 */
>> >
>> >If we're really worried about performance here, and use a local buffer
>> >to optimize the wraparounds, would it actually be more efficient to use
>> >memcpy_fromio() which has an arch specific implementation in asm?
>>
>> Not really worried about performance. I actually did 3 implementations
>> that avoids the unaligned io reads.
>>
>> 1) this one
>> 2) memcpy_fromio() to the local buffer + strnstr()
>> 3) allocate a oprom buffer, memcpy_fromio() the entire rom and keep a
>> pointer to it. Then free the oprom after the vbt is used
>>
>> (2) and (1) had basically the same complexity involved of requiring a
>> wrap around local buffer, so I went with (1)
>>
>> I didn't feel confortable with (3) because it would allocate much more
>> memory than really needed.
>>
>> >
>> >In any case makes you think you should first have the patch that the
>> >patch subject claims, fix unaligned reads and add optimizations
>> >next. This one does too much.
>>
>> Again, it was not really meant to be an optimization.
>>
>> Ville told me that we may not really need to deal with the unaligned
>> access and change the implementation to expect the VBT to be aligned.
>> This I would be the simplest way to change it, but I'm not fond on
>> changing this and breaking old systems usin it... anyway, we can give it
>> a try and revert if it breaks.
>
>The current code already assumes 4 byte alignment. So nothing would
>change and so nothing can get broken.

the code for reading the oprom via pci is not assuming 4-byte alignment.
See above, it's doing

	for (i = 0; i + 4 < size; i++)

It might as well be using a ioread8() + a shift and it would be the
same, since it only advances 1 byte per loop.

Hmm, indeed you are correct. I guess the +4 tricked me into thinking
otherwise.

it tricked me too, it took some time to realize that it was advancing
just one byte... maybe that was when I decided: ok, let me reimplement
this rather than focusing on the minimum viable fix.



Saying that from acpi it needs to be aligned so the oprom should be
aligned IMO is not valid because the pci method was there before the
acpi one.  I don't know exactly what I may be breaking if I switch to
4-byte alignment.

Anyway, my new patch splits the changes, as requested by Jani.
Just enforcing we don't ignore the address space already fixes my
issue. So maybe we can leave the alignment alone and not touch it.

I suppose we can stick to the i++ if the code can be kept simple
enough. But I'm totally willing to put my name on a i+=4 patch
if the complexity starts to rise alarmingly.

yep, we can try that on top.

thanks
Lucas De Marchi


--
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux