On 2022-05-09 02:06, Christian König wrote: > Am 07.05.22 um 02:03 schrieb Luben Tuikov: >> [SNIP] >> + expect_val = LSDMA_PIO_STATUS__PIO_IDLE_MASK | LSDMA_PIO_STATUS__PIO_FIFO_EMPTY_MASK; >> + for (i = 0; i < usec_timeout; i++) { >> + tmp = RREG32_SOC15(LSDMA, 0, regLSDMA_PIO_STATUS); >> + if ((tmp & expect_val) == expect_val) >> + break; >> + udelay(1); >> + } >>>> Does the hardware specify a minimum delay after the write to the LSDMA_PIO_COMMAND, >>>> and before we check the LSDMA_PIO_STATUS? >>> I'm not sure, but it should be pretty minimal. >> My point is that there should be a delay at least as large as the polling delay, >> after writing to the command register and before reading the status register. >> So that should be at least a 1 us. >> >> There should be a udelay(1) after writing the command to the LSDMA_PIO_COMMAND >> register, before the for () loop begins. > > I can't count how often I had to reject that approach. This is exactly > what we should *not* do. > > The SRBM (or whatever that's called on newer hardware) is the one who > inserts the delay here. The driver should not explicitly do that > additionally. > > The background is that a) the read is often used to commit the write > operations, both for the PCIe as well as internally in the GPU and b) > the read only comes back when the SRBM has synced up between the bus > interface and the hardware block in question. > > So when the SRBM has synced between the two clock domains the operation > has usually been completed or at least started. The minimal delay we > enter between reads is just to avoid hammering on the register bus when > the hardware block in question is (for example) power gated or otherwise > not reachable. Yeah, that's what I wanted to avoid, but if you say that we need it to make sure that the write has committed, then it's okay. Regards, Luben > > Regards, > Christian. > >> >> Regards, >> Luben >> >> >>>> At the moment the code above checks the status immediately after writing to >>>> LSDMA_PIO_COMMAND, and the copy wouldn't be completed. >>>> >>>> If the poll timeout is 1 us, as the loop shows us, maybe the command completion >>>> is larger than that (the time after writing to LSDMA_PIO_COMMAND and before >>>> checking LSDMA_PIO_STATUS)? >>> The time it takes for the copy will be dependent on the amount of data >>> there is to copy. While the command is probably not complete on the >>> first read of the status register, it may be required to make sure the >>> previous write goes through. >>> >>> Alex >>> >>>>> + >>>>> + if (i >= usec_timeout) { >>>>> + dev_err(adev->dev, "LSDMA PIO failed to copy memory!\n"); >>>>> + return -ETIMEDOUT; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> const struct amdgpu_lsdma_funcs lsdma_v6_0_funcs = { >>>>> + .copy_mem = lsdma_v6_0_copy_mem >>>>> }; >>>> Regards, >>>> -- >>>> Luben >> Regards, > Regards, -- Luben