Hi Steve, thinks for your reply. I set the pte in arm_lpae_prot_to_pte(), *********************************************************************** /* * Also Mali has its own notions of shareability wherein its Inner * domain covers the cores within the GPU, and its Outer domain is * "outside the GPU" (i.e. either the Inner or System domain in CPU * terms, depending on coherency). */ if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS; *********************************************************************** I set pte |= ARM_LPAE_PTE_SH_NS. If I set pte to ARM_LPAE_PTE_SH_OS or ARM_LPAE_PTE_SH_IS,whether I use singel core GPU or multi core GPU,it will occur GPU Fault. if I set pte to ARM_LPAE_PTE_SH_NS,whether I use singel core GPU or multi core GPU,it will not occur GPU Fault. Thinks Chunyou 于 Mon, 28 Jun 2021 11:48:59 +0100 Steven Price <steven.price@xxxxxxx> 写道: > On 25/06/2021 10:49, Chunyou Tang wrote: > > Hi Steve, > > Thinks for your reply. > > When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have no > > "GPU Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or > > ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte > > effect this issue? > > Can you give me some suggestions again? > > > > Thinks. > > > > Chunyou > > Hi Chunyou, > > You haven't given me much context so I'm not entirely sure which PTE > you are talking about (GPU or CPU), or indeed where you are changing > the PTE values. > > The PTEs control whether a page is shareable or not, the GPU requires > that accesses are consistent (i.e. either all accesses to a page are > shareable or all are non-shareable) and will race a fault if it > detects this isn't the case. Mali also has a quirk for its version of > 'LPAE' where inner shareable actually means only within the GPU and > outer shareable means outside the GPU (which I think usually means > Inner Shareable on the external bus). > > Steve > > > 于 Thu, 24 Jun 2021 14:22:04 +0100 > > Steven Price <steven.price@xxxxxxx> 写道: > > > >> On 22/06/2021 02:40, Chunyou Tang wrote: > >>> Hi Steve, > >>> I will send a new patch with suitable subject/commit > >>> message. But I send a V3 or a new patch? > >> > >> Send a V3 - it is a new version of this patch. > >> > >>> I met a bug about the GPU,I have no idea about how to fix > >>> it, If you can give me some suggestion,it is perfect. > >>> > >>> You can see such kernel log: > >>> > >>> Jun 20 10:20:13 icube kernel: [ 774.566760] mvp_gpu 0000:05:00.0: > >>> GPU Fault 0x00000088 (SHAREABILITY_FAULT) at 0x000000000310fd00 > >>> Jun 20 10:20:13 icube kernel: [ 774.566764] mvp_gpu 0000:05:00.0: > >>> There were multiple GPU faults - some have not been reported Jun > >>> 20 10:20:13 icube kernel: [ 774.667542] mvp_gpu 0000:05:00.0: > >>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [ 774.767900] > >>> mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube > >>> kernel: [ 774.868546] mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck > >>> Jun 20 10:20:13 icube kernel: [ 774.968910] mvp_gpu 0000:05:00.0: > >>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [ 775.069251] > >>> mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:22 icube > >>> kernel: [ 783.693971] mvp_gpu 0000:05:00.0: gpu sched timeout, > >>> js=1, config=0x7300, status=0x8, head=0x362c900, tail=0x362c100, > >>> sched_job=000000003252fb84 > >>> > >>> In > >>> https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.clem@xxxxxxxxx/ > >>> there had a same bug like mine,and I found you at the mail list,I > >>> don't know how it fixed? > >> > >> The GPU_SHAREABILITY_FAULT error means that a cache line has been > >> accessed both as shareable and non-shareable and therefore > >> coherency cannot be guaranteed. Although the "multiple GPU faults" > >> means that this may not be the underlying cause. > >> > >> The fact that your dmesg log has PCI style identifiers > >> ("0000:05:00.0") suggests this is an unusual platform - I've not > >> previously been aware of a Mali device behind PCI. Is this device > >> working with the kbase/DDK proprietary driver? It would be worth > >> looking at the kbase kernel code for the platform to see if there > >> is anything special done for the platform. > >> > >> From the dmesg logs all I can really tell is that the GPU seems > >> unhappy about the memory system. > >> > >> Steve > >> > >>> I need your help! > >>> > >>> thinks very much! > >>> > >>> Chunyou > >>> > >>> 于 Mon, 21 Jun 2021 11:45:20 +0100 > >>> Steven Price <steven.price@xxxxxxx> 写道: > >>> > >>>> On 19/06/2021 04:18, Chunyou Tang wrote: > >>>>> Hi Steve, > >>>>> 1,Now I know how to write the subject > >>>>> 2,the low 8 bits is the exception type in spec. > >>>>> > >>>>> and you can see prnfrost_exception_name() > >>>>> > >>>>> switch (exception_code) { > >>>>> /* Non-Fault Status code */ > >>>>> case 0x00: return "NOT_STARTED/IDLE/OK"; > >>>>> case 0x01: return "DONE"; > >>>>> case 0x02: return "INTERRUPTED"; > >>>>> case 0x03: return "STOPPED"; > >>>>> case 0x04: return "TERMINATED"; > >>>>> case 0x08: return "ACTIVE"; > >>>>> ........ > >>>>> ........ > >>>>> case 0xD8: return "ACCESS_FLAG"; > >>>>> case 0xD9 ... 0xDF: return "ACCESS_FLAG"; > >>>>> case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT"; > >>>>> case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT"; > >>>>> } > >>>>> return "UNKNOWN"; > >>>>> } > >>>>> > >>>>> the exception_code in case is only 8 bits,so if fault_status > >>>>> in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct > >>>>> exception reason,it will be always UNKNOWN. > >>>> > >>>> Yes, I'm happy with the change - I just need a patch that I can > >>>> apply. At the moment this patch only changes the first '0x%08x' > >>>> output rather than the call to panfrost_exception_name() as well. > >>>> So we just need a patch which does: > >>>> > >>>> - fault_status & 0xFF, panfrost_exception_name(pfdev, > >>>> fault_status), > >>>> + fault_status, panfrost_exception_name(pfdev, fault_status & > >>>> 0xFF), > >>>> > >>>> along with a suitable subject/commit message describing the > >>>> change. If you can send me that I can apply it. > >>>> > >>>> Thanks, > >>>> > >>>> Steve > >>>> > >>>> PS. Sorry for going round in circles here - I'm trying to help > >>>> you get setup so you'll be able to contribute patches easily in > >>>> future. An important part of that is ensuring you can send a > >>>> properly formatted patch to the list. > >>>> > >>>> PPS. I'm still not receiving your emails directly. I don't think > >>>> it's a problem at my end because I'm receiving other emails, but > >>>> if you can somehow fix the problem you're likely to receive a > >>>> faster response. > >>>> > >>>>> 于 Fri, 18 Jun 2021 13:43:24 +0100 > >>>>> Steven Price <steven.price@xxxxxxx> 写道: > >>>>> > >>>>>> On 17/06/2021 07:20, ChunyouTang wrote: > >>>>>>> From: ChunyouTang <tangchunyou@xxxxxxxxxxxx> > >>>>>>> > >>>>>>> of the low 8 bits. > >>>>>> > >>>>>> Please don't split the subject like this. The first line of the > >>>>>> commit should be a (very short) summary of the patch. Then a > >>>>>> blank line and then a longer description of what the purpose of > >>>>>> the patch is and why it's needed. > >>>>>> > >>>>>> Also you previously had this as part of a series (the first > >>>>>> part adding the "& 0xFF" in the panfrost_exception_name() > >>>>>> call). I'm not sure we need two patches for the single line, > >>>>>> but as it stands this patch doesn't apply. > >>>>>> > >>>>>> Also I'm still not receiving any emails from you directly (only > >>>>>> via the list), so it's possible I might have missed something > >>>>>> you sent. > >>>>>> > >>>>>> Steve > >>>>>> > >>>>>>> > >>>>>>> Signed-off-by: ChunyouTang <tangchunyou@xxxxxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +- > >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>> > >>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c > >>>>>>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c index > >>>>>>> 1fffb6a0b24f..d2d287bbf4e7 100644 --- > >>>>>>> a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ > >>>>>>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ > >>>>>>> static irqreturn_t panfrost_gpu_irq_handler(int irq, void > >>>>>>> *data) address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); > >>>>>>> dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) > >>>>>>> at 0x%016llx\n", > >>>>>>> - fault_status & 0xFF, > >>>>>>> panfrost_exception_name(pfdev, fault_status & 0xFF), > >>>>>>> + fault_status, > >>>>>>> panfrost_exception_name(pfdev, fault_status & 0xFF), address); > >>>>>>> > >>>>>>> if (state & GPU_IRQ_MULTIPLE_FAULT) > >>>>>>> > >>>>> > >>>>> > >>> > >>> > > > >