On Tue, Nov 10, 2020 at 04:40:23PM +1100, Brad Campbell wrote: > On 10/11/20 3:55 pm, Guenter Roeck wrote: > > On Tue, Nov 10, 2020 at 01:04:04PM +1100, Brad Campbell wrote: > >> On 9/11/20 3:06 am, Guenter Roeck wrote: > >>> On 11/8/20 2:14 AM, Henrik Rydberg wrote: > >>>> On Sun, Nov 08, 2020 at 09:35:28AM +0100, Henrik Rydberg wrote: > >>>>> Hi Brad, > >>>>> > >>>>> On 2020-11-08 02:00, Brad Campbell wrote: > >>>>>> G'day Henrik, > >>>>>> > >>>>>> I noticed you'd also loosened up the requirement for SMC_STATUS_BUSY in read_smc(). I assume > >>>>>> that causes problems on the early Macbook. This is revised on the one sent earlier. > >>>>>> If you could test this on your Air1,1 it'd be appreciated. > >>>>> > >>>>> No, I managed to screw up the patch; you can see that I carefully added the > >>>>> same treatment for the read argument, being unsure if the BUSY state would > >>>>> remain during the AVAILABLE data phase. I can check that again, but > >>>>> unfortunately the patch in this email shows the same problem. > >>>>> > >>>>> I think it may be worthwhile to rethink the behavior of wait_status() here. > >>>>> If one machine shows no change after a certain status bit change, then > >>>>> perhaps the others share that behavior, and we are waiting in vain. Just > >>>>> imagine how many years of cpu that is, combined. ;-) > >>>> > >>>> Here is a modification along that line. > >>>> > >>> > >>> Please resend this patch as stand-alone v4 patch. If sent like it was sent here, > >>> it doesn't make it into patchwork, and is thus not only difficult to apply but > >>> may get lost, and it is all but impossible to find and apply all tags. > >>> Also, prior to Henrik's Signed=off-by: there should be a one-line explanation > >>> of the changes made. > >>> > >>> Thanks, > >>> Guenter > >>> > >>>> Compared to your latest version, this one has wait_status() return the > >>>> actual status on success. Instead of waiting for BUSY, it waits for > >>>> the other status bits, and checks BUSY afterwards. So as not to wait > >>>> unneccesarily, the udelay() is placed together with the single > >>>> outb(). The return value of send_byte_data() is augmented with > >>>> -EAGAIN, which is then used in send_command() to create the resend > >>>> loop. > >>>> > >>>> I reach 41 reads per second on the MBA1,1 with this version, which is > >>>> getting close to the performance prior to the problems. > >>>> > >> > >> Can I get an opinion on this wait statement please? > >> > >> The apple driver uses a loop with a million (1,000,000) retries spaced with a 10uS delay. > >> > >> In my testing on 2 machines, we don't busy wait more than about 2 loops. > >> Replacing a small udelay with the usleep_range kills performance. > >> With the following (do 10 fast checks before we start sleeping) I nearly triple the performance > >> of the driver on my laptop, and double it on my iMac. This is on an otherwise unmodified version of > >> Henriks v4 submission. > >> > >> Yes, given the timeouts I know it's a ridiculous loop condition. > >> > >> static int wait_status(u8 val, u8 mask) > >> { > >> unsigned long end = jiffies + (APPLESMC_MAX_WAIT * HZ) / USEC_PER_SEC; > >> u8 status; > >> int i; > >> > >> for (i=1; i < 1000000; i++) { > > > > The minimum wait time is 10 us, or 16uS after the first 10 > > attempts. 1000000 * 10 = 10 seconds. I mean, it would make > > some sense to limit the loop to APPLESMC_MAX_WAIT / > > APPLESMC_MIN_WAIT iterations, but why 1,000,000 ? > > I had to pick a big number and it was easy to punch in 6 zeros as that is what is in > the OSX driver. In this instance it's more a proof of concept rather than sane example > because it'll either abort on timeout or return the correct status anyway. > Could also have been a while (true) {} but then I'd need to manually increment i. > > >> status = inb(APPLESMC_CMD_PORT); > >> if ((status & mask) == val) > >> return status; > >> /* timeout: give up */ > >> if (time_after(jiffies, end)) > >> break; > >> if (i < 10) > >> udelay(10); > >> else > >> usleep_range(APPLESMC_MIN_WAIT, APPLESMC_MIN_WAIT * 16); > > > > The original code had the exponential wait time increase. > > I don't really see the point of changing that. I'd suggest > > to keep the exponential increase but change the code to > > something like > > if (us < APPLESMC_MIN_WAIT * 4) > > udelay(us) > > else > > usleep_range(us, us * 16); > > The main reason I dropped the exponential was best case on this laptop the modified code with exponential > wait as described above increase increases the performance from ~40 -> 62 reads/sec, whereas the version > I posted with the first 10 loops at 10uS goes from ~40 -> 100 reads/sec. > > About 95% of waits never get past a few of iterations of that loop (so ~30-60uS). With a > modified exponential starting at 8uS a 30uS requirement ends up at 56uS. If it needed 60us with > the original we end up waiting 120uS. > > If it needs longer than 120uS it's rare enough that a bigger sleep isn't going to cause much > of a performance hit. > > Getting completely pathological and busy waiting with a fixed 10uS delay like the OSX driver > does give about 125 reads/sec but I was trying to find a balance and 10 loops seemed to hit that mark. > > > Effectively that means the first wait would be 16 uS, > > followed by 32 uS, followed by increasingly larger sleep > > times. I don't know the relevance of APPLESMC_MIN_WAIT > > being set to 16, but if you'd want to start with smaller > > wait times you could reduce it to 8. If you are concerned > > about excessively large sleep times you could reduce > > the span from us..us*16 to, say, us..us*4 or us..us*2. > > I was tossing up here between the overhead required to manage a tighter usleep_range > vs some extra udelays. > > It's not exactly a performance sensitive driver, but I thought there might be a balance to be > struck between performance and simplicity. The exponential delay always struck me as odd. > > If the preference is to leave it as is or modify as suggested I'm ok with that too. > Appreciate the input. Ok, not worth arguing about. Guenter > > Regards, > Brad