Re: [RFC] m68k: Fix dead code in bus_error030()

Michael Schmitz <schmitzmic@xxxxxxxxx> · Sat, 10 Mar 2018 15:50:08 +1300

Hi Finn,

Am 10.03.2018 um 11:51 schrieb Finn Thain:
On Fri, 9 Mar 2018, Michael Schmitz wrote:

How does the PDMA logic raise the exception? If we find none of the 
usual MMU status register bits are set, we could take that as an 
indication that the exception wasn't raised by the MMU, so no page or 
protection fault. Pretty much leaves only the PDMA logic (if present).

It's a hardware exception, not a software exception. The bus error is 
generated by signals from one of Apple's ASICs. This logic circuit 
effectively interfaces the SCSI bus with the system bus, via the SCSI 
controller, for performance. But that's hardly relevant. I'm more 
interested in the bug in mainline, not the bug in my RFC patch.

Well, the bug in mainline is what allows PDMA to work on 030, by a happy
coincidence. I was simply wondering what MMU status bits we'd likely see
if the PDMA ASIC generates an exception.

- What are the implications of the existing logic error?

We might miss handling (MMU_B|MMU_L|MMU_S) && (ssw & RM) (should be 
harmless),

I think that leads to a recursive fault which would kill the machine 
instead of just the user process that caused the fault, but I don't have 
code to confirm this.

Too dangerous to consider, then.

and might log (MMU_B|MMU_L|MMU_S) && !(ssw & RM) as unexpected bus error 
if there wasn't a user process to signal or an exception vector to fix 
up, or even panic. Doesn't seem to happen though.

- What was the author trying to achieve? Why a special case for RMW 
  faults?

Because the 020 can't do RMW instructions,

I don't think that's correct.

'probably a 020 CAS instruction' suggests that these might be the only
ones having (ssw & RM) true.

From Kconfig.cpu:

config RMW_INSNS
        bool "Use read-modify-write instructions"
        depends on ADVANCED
        ---help---
          This allows to use certain instructions that work with indivisible
          read-modify-write bus cycles. While this is faster than the
          workaround of disabling interrupts, it can conflict with DMA
          ( = direct memory access) on many Amiga systems, and it is
also said
          to destabilize other machines. It is very likely that this will
          cause serious problems on any Amiga or Atari Medusa if set.
The only
          configuration where it should work are 68030-based Ataris,
where it
          apparently improves performance. But you've been warned!
Unless you
          really know what you are doing, say N. Try Y only if you're quite
          adventurous.

The comment about Atari Medusa (040 IIRC) might no longer be correct
after Roman fixed the recursive fault on 040. But it still might be true
for 030 Amiga, where (as I understand the statement) DMA operations may
interfere with RMW bus cycles which might cause such exceptions.

- Should the dead code be deleted because the live algorithm cannot be 
  improved upon? (The present algorithm works fine for PDMA for example.)

The end result might be the same with the current code (except for PDMA 
working): signal to user process (maybe the wrong one; weird access 
forces SEGV), or panic. The main difference is we don't fix up the 
exception from process exception tables. I believe that is what makes 
PDMA work, i.e. fixes up the PDMA bus fault?

I don't follow. Deleting dead code means no difference. Everything works 
the same (and so PDMA keeps working, but that's a red herring).

What I meant is the end result of the code as-is would be the same
fixing the logic to take the default branch as we all agree should have
happened.

The end result of deleting the dead code is, of course, current
behaviour, which appears to be just fine and changing that would require
a lot of testing.

So perhaps try send_fault_signal() in the default branch, which will 
also run die_if_kernel() if need be.

I tried that (Stan tested it). It turns out that usermode instruction 
faults can also traverse that branch so /sbin/init just crashed. I think I 

If they happen together with a data fault, that branch will be used. If
exception fixup was successful, i.e. send_fault_signal() == -1, you
should not need to signal but continue to instuction fault handling? Or
does the fault just happen again?

can resolve that. But there's not much point in pursuing that until the 
architecure experts agree that there's a bug to be fixed.

Anyone?

Cheers,

	Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html