On Fri, 2016-06-17 at 17:57 +0200, Matthias Brugger wrote: > > On 17/06/16 10:28, Horng-Shyang Liao wrote: > > Hi Matthias, > > > > On Tue, 2016-06-14 at 20:07 +0800, Horng-Shyang Liao wrote: > >> Hi Matthias, > >> > >> On Tue, 2016-06-14 at 12:17 +0200, Matthias Brugger wrote: > >>> > >>> On 14/06/16 09:44, Horng-Shyang Liao wrote: > >>>> Hi Matthias, > >>>> > >>>> On Wed, 2016-06-08 at 17:35 +0200, Matthias Brugger wrote: > >>>>> > >>>>> On 08/06/16 14:25, Horng-Shyang Liao wrote: > >>>>>> Hi Matthias, > >>>>>> > >>>>>> On Wed, 2016-06-08 at 12:45 +0200, Matthias Brugger wrote: > >>>>>>> > >>>>>>> On 08/06/16 07:40, Horng-Shyang Liao wrote: > >>>>>>>> Hi Matthias, > >>>>>>>> > >>>>>>>> On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote: > >>>>>>>>> > >>>>>>>>> On 03/06/16 15:11, Matthias Brugger wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> [...] > >>>>>>>>> > >>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>> + smp_mb(); /* modify jump before enable thread */ > >>>>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>> + cmdq_thread_writel(thread, task->pa_base + > >>>>>>>>>>>>>>>>>>> task->command_size, > >>>>>>>>>>>>>>>>>>> + CMDQ_THR_END_ADDR); > >>>>>>>>>>>>>>>>>>> + cmdq_thread_resume(thread); > >>>>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>>>> + list_move_tail(&task->list_entry, &thread->task_busy_list); > >>>>>>>>>>>>>>>>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags); > >>>>>>>>>>>>>>>>>>> +} > >>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq, > >>>>>>>>>>>>>>>>>>> + struct cmdq_thread *thread, u32 irq_flag) > >>>>>>>>>>>>>>>>>>> +{ > >>>>>>>>>>>>>>>>>>> + struct cmdq_task *task, *tmp, *curr_task = NULL; > >>>>>>>>>>>>>>>>>>> + u32 curr_pa; > >>>>>>>>>>>>>>>>>>> + struct cmdq_cb_data cmdq_cb_data; > >>>>>>>>>>>>>>>>>>> + bool err; > >>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>> + if (irq_flag & CMDQ_THR_IRQ_ERROR) > >>>>>>>>>>>>>>>>>>> + err = true; > >>>>>>>>>>>>>>>>>>> + else if (irq_flag & CMDQ_THR_IRQ_DONE) > >>>>>>>>>>>>>>>>>>> + err = false; > >>>>>>>>>>>>>>>>>>> + else > >>>>>>>>>>>>>>>>>>> + return; > >>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>> + curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR); > >>>>>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>>>>> + list_for_each_entry_safe(task, tmp, &thread->task_busy_list, > >>>>>>>>>>>>>>>>>>> + list_entry) { > >>>>>>>>>>>>>>>>>>> + if (curr_pa >= task->pa_base && > >>>>>>>>>>>>>>>>>>> + curr_pa < (task->pa_base + task->command_size)) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> What are you checking here? It seems as if you make some implcit > >>>>>>>>>>>>>>>>>> assumptions about pa_base and the order of execution of > >>>>>>>>>>>>>>>>>> commands in the > >>>>>>>>>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any > >>>>>>>>>>>>>>>>>> guarantees > >>>>>>>>>>>>>>>>>> about dma_handle? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 1. Check what is the current running task in this GCE thread. > >>>>>>>>>>>>>>>>> 2. Yes. > >>>>>>>>>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no > >>>>>>>>>>>>>>>> guarantee that the dma_handle address is steadily growing, when > >>>>>>>>>>>>>>>> calling > >>>>>>>>>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you > >>>>>>>>>>>>>>>> use this > >>>>>>>>>>>>>>>> assumption to decide if the task picked from task_busy_list is > >>>>>>>>>>>>>>>> currently > >>>>>>>>>>>>>>>> executing. So I think this mecanism is not working. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I don't use dma_handle address, and just use physical addresses. > >>>>>>>>>>>>>>> From CPU's point of view, tasks are linked by the busy list. > >>>>>>>>>>>>>>> From GCE's point of view, tasks are linked by the JUMP command. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> In which cases does the HW thread raise an interrupt. > >>>>>>>>>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> GCE will raise interrupt if any task is done or error. > >>>>>>>>>>>>>>> However, GCE is fast, so CPU may get multiple done tasks > >>>>>>>>>>>>>>> when it is running ISR. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> In case of error, that GCE thread will pause and raise interrupt. > >>>>>>>>>>>>>>> So, CPU may get multiple done tasks and one error task. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think we should reimplement the ISR mechanism. Can't we just read > >>>>>>>>>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave > >>>>>>>>>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass > >>>>>>>>>>>>>> information from the handler to thread_fn, but that shouldn't be an > >>>>>>>>>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay > >>>>>>>>>>>>>> there as short as possible. Traversing task_busy_list is expensive, so > >>>>>>>>>>>>>> we need to do it in a thread context. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Actually, our initial implementation is similar to your suggestion, > >>>>>>>>>>>>> but display needs CMDQ to return callback function very precisely, > >>>>>>>>>>>>> else display will drop frame. > >>>>>>>>>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms, > >>>>>>>>>>>>> and CMDQ needs to call callback function in ISR. > >>>>>>>>>>>>> If we defer callback to workqueue, the time interval may be larger than > >>>>>>>>>>>>> 32 ms.sometimes. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> I think the problem is, that you implemented the workqueue as a ordered > >>>>>>>>>>>> workqueue, so there is no parallel processing. I'm still not sure why > >>>>>>>>>>>> you need the workqueue to be ordered. Can you please explain. > >>>>>>>>>>> > >>>>>>>>>>> The order should be kept. > >>>>>>>>>>> Let me use mouse cursor as an example. > >>>>>>>>>>> If task 1 means move mouse cursor to point A, task 2 means point B, > >>>>>>>>>>> and task 3 means point C, our expected result is A -> B -> C. > >>>>>>>>>>> If the order is not kept, the result could become A -> C -> B. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Got it, thanks for the clarification. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> I think a way to get rid of the workqueue is to use a timer, which gets > >>>>>>>>> programmed to the time a timeout in the first task in the busy list > >>>>>>>>> would happen. Everytime we update the busy list (e.g. because of task > >>>>>>>>> got finished by the thread), we update the timer. When the timer > >>>>>>>>> triggers, which hopefully won't happen too often, we return timeout on > >>>>>>>>> the busy list elements, until the time is lower then the actual time. > >>>>>>>>> > >>>>>>>>> At least with this we can reduce the data structures in this driver and > >>>>>>>>> make it more lightweight. > >>>>>>>> > >>>>>>>> From my understanding, your proposed method can handle timeout case. > >>>>>>>> > >>>>>>>> However, the workqueue is also in charge of releasing tasks. > >>>>>>>> Do you take releasing tasks into consideration by using the proposed > >>>>>>>> timer method? > >>>>>>>> Furthermore, I think the code will become more complex if we also use > >>>>>>>> timer to implement releasing tasks. > >>>>>>>> > >>>>>>> > >>>>>>> Can't we call > >>>>>>> clk_disable_unprepare(cmdq->clock); > >>>>>>> cmdq_task_release(task); > >>>>>>> after invoking the callback? > > > > After I put clk_disable_unprepare(cmdq->clock) into ISR, I encounter > > another BUG. > > > > (Quote some Linux 4.7 source code.) > > > > 605 void clk_unprepare(struct clk *clk) > > 606 { > > 607 if (IS_ERR_OR_NULL(clk)) > > 608 return; > > 609 > > 610 clk_prepare_lock(); // <-- Here > > 611 clk_core_unprepare(clk->core); > > 612 clk_prepare_unlock(); > > 613 } > > 614 EXPORT_SYMBOL_GPL(clk_unprepare); > > > > 91 static void clk_prepare_lock(void) > > 92 { > > 93 if (!mutex_trylock(&prepare_lock)) { // <-- Here > > 94 if (prepare_owner == current) { > > 95 prepare_refcnt++; > > 96 return; > > 97 } > > 98 mutex_lock(&prepare_lock); > > 99 } > > 100 WARN_ON_ONCE(prepare_owner != NULL); > > 101 WARN_ON_ONCE(prepare_refcnt != 0); > > 102 prepare_owner = current; > > 103 prepare_refcnt = 1; > > 104 } > > > > So, 'unprepare' can sleep and cannot be put into ISR. > > I also try to put it into a timer, but the error is the same > > since timer callback is executed by softirq. > > > > We need clk_disable_unprepare() since it can save power consumption > > in idle. > > We can call clk_prepare in probe and then use clk_enable/clk_disable, > which don't sleep. > > Regards, > Matthias Hi Matthias, Because clock gate and MUX are controlled by clk_enable/clk_disable, and PLL is controlled by clk_prepare/clk_unprepare, I still need to call clk_unprepare. After I remove releasing buffer, releasing task, and timeout task from work, the work can be detached from task. Therefore, I can use the following flow to reduce the number of works. if task_busy_list from empty to non-empty clk_prepare_enable if task_busy_list from non-empty to empty in ISR, add work for clk_disable_unprepare What do you think of this solution? Thanks, HS > > Therefore, I plan to > > (1) move releasing buffer and task into ISR, > > (2) move timeout into timer, and > > (3) keep workqueue for clk_disable_unprepare(). > > > > What do you think? > > > > Thanks, > > HS > > > >>>>>> > >>>>>> Do you mean just call these two functions in ISR? > >>>>>> My major concern is dma_free_coherent() and kfree() in > >>>>>> cmdq_task_release(task). > >>>>> > >>>>> Why do we need the dma calls at all? Can't we just calculate the > >>>>> physical address using __pa(x)? > >>>> > >>>> I prefer to use dma_map_single/dma_unmap_single. > >>>> > >>> > >>> Can you please elaborate why you need this. We don't do dma, so we > >>> should not use dma memory for this. > >> > >> We need a buffer to share between CPU and GCE, so we do need DMA. > >> CPU is in charge of writing GCE commands into this buffer. > >> GCE is in charge of reading and running GCE commands from this buffer. > >> When we chain CMDQ tasks, we also need to modify GCE JUMP command. > >> Therefore, I prefer to use dma_alloc_coherent and dma_free_coherent. > >> > >> However, if we want to use timer to handle timeout, we need to release > >> memory in ISR. > >> In this case, using kmalloc/kfree + dma_map_single/dma_unmap_single > >> instead of dma_alloc_coherent/dma_free_coherent is an alternative > >> solution, but taking care the synchronization between cache and memory > >> is the expected overhead. > >> > >>>>>> Therefore, your suggestion is to use GFP_ATOMIC for both > >>>>>> dma_alloc_coherent() and kzalloc(). Right? > >>>>> > >>>>> I don't think we need GFP_ATOMIC, the critical path will just free the > >>>>> memory. > >>>> > >>>> I tested these two functions, and kfree was safe. > >>>> However, dma_free_coherent raised BUG. > >>>> BUG: failure at > >>>> /mnt/host/source/src/third_party/kernel/v3.18/mm/vmalloc.c:1514/vunmap()! > >>> > >>> Just a general hint. Please try to evaluate on a recent kernel. It looks > >>> like as if you tried this on a v3.18 based one. > >> > >> This driver should be backward compatible to v3.18 for a MTK project. > >> > >>> Best regards, > >>> Matthias > >> > >> Thanks, > >> HS > >> > >>>> 1512 void vunmap(const void *addr) > >>>> 1513 { > >>>> 1514 BUG_ON(in_interrupt()); // <-- here > >>>> 1515 might_sleep(); > >>>> 1516 if (addr) > >>>> 1517 __vunmap(addr, 0); > >>>> 1518 } > >>>> 1519 EXPORT_SYMBOL(vunmap); > >>>> > >>>> Therefore, I plan to use kmalloc + dma_map_single instead of > >>>> dma_alloc_coherent, and dma_unmap_single + kfree instead of > >>>> dma_free_coherent. > >>>> > >>>> What do you think about the function replacement? > >>>> > >>>>>> If so, I can try to implement timeout by timer, and discuss with you > >>>>>> if I have further questions. > >>>>>> > >>>>> > >>>>> Sounds good :) > >>>>> > >>>>> Thanks, > >>>>> Matthias > >>>> > >>>> Thanks, > >>>> HS > >>>> > >>>>>>> Regrading the clock, wouldn't it be easier to handle the clock > >>>>>>> enable/disable depending on the state of task_busy_list? I suppose we > >>>>>>> can't as we would need to check the task_busy_list of all threads, right? > >>>>>>> > >>>>>>> Regards, > >>>>>>> Matthias > >>>>>> > >>>>>> Thanks, > >>>>>> HS -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html