On Sat, 10 Mar 2018, Michael Schmitz wrote:
It's a hardware exception, not a software exception. The bus error is
generated by signals from one of Apple's ASICs. This logic circuit
effectively interfaces the SCSI bus with the system bus, via the SCSI
controller, for performance. But that's hardly relevant. I'm more
interested in the bug in mainline, not the bug in my RFC patch.
Well, the bug in mainline is what allows PDMA to work on 030,
No it isn't. I proposed an RFC patch that made PDMA fail. To me that means
that the patch has a problem. It doesn't say anything about the existing
code.
by a happy coincidence. I was simply wondering what MMU status bits we'd
likely see if the PDMA ASIC generates an exception.
Well, we know that 0 == (mmusr & (MMU_I|MMU_WP|MMU_B|MMU_L|MMU_S)). Do we
need to know the state of MMU_M and MMU_T also? I assume that the MMU
status is exactly the same for a SCSI MMIO access regardless of whether or
not it happens to fault: the exception (when it happens) tells you
something about the SCSI bus, not the MMU.
- What are the implications of the existing logic error?
We might miss handling (MMU_B|MMU_L|MMU_S) && (ssw & RM) (should be
harmless),
I think that leads to a recursive fault which would kill the machine
instead of just the user process that caused the fault, but I don't
have code to confirm this.
Too dangerous to consider, then.
Dangerous enough to want verification with some exception handler tests, I
think.
config RMW_INSNS
bool "Use read-modify-write instructions"
depends on ADVANCED
---help---
This allows to use certain instructions that work with
indivisible read-modify-write bus cycles. While this is faster
than the workaround of disabling interrupts, it can conflict
with DMA ( = direct memory access)
Makes sense. For RMW accesses the CPU has to lock out other bus masters
like the DMA controller (or any device that might modify memory somehow).
on many Amiga systems, and it is also said to destabilize
other machines. It is very likely that this will cause serious
problems on any Amiga or Atari Medusa if set. The only
configuration where it should work are 68030-based Ataris,
where it apparently improves performance. But you've been
warned! Unless you really know what you are doing, say N. Try
Y only if you're quite adventurous.
The comment about Atari Medusa (040 IIRC) might no longer be correct
after Roman fixed the recursive fault on 040.
I've always played it safe and followed this advice. But I agree that
there are probably systems besides 68030-based Ataris where RMW is just
fine. But who knows.
But it still might be true for 030 Amiga, where (as I understand the
statement) DMA operations may interfere with RMW bus cycles which might
cause such exceptions.
We could use #ifdef CONFIG_RMW_INSNS in the bus_error030() implementation
but it won't help against user mode RMW faults, so all of this seems to
beg the same question: "Why a special case for RMW faults?".
- Should the dead code be deleted because the live algorithm cannot
be improved upon? (The present algorithm works fine for PDMA for
example.)
The end result might be the same with the current code (except for
PDMA working): signal to user process (maybe the wrong one; weird
access forces SEGV), or panic. The main difference is we don't fix up
the exception from process exception tables. I believe that is what
makes PDMA work, i.e. fixes up the PDMA bus fault?
I don't follow. Deleting dead code means no difference. Everything
works the same (and so PDMA keeps working, but that's a red herring).
What I meant is the end result of the code as-is would be the same
fixing the logic to take the default branch as we all agree should have
happened.
I don't see much agreement.
You appear to be in agreement with some part of the RFC patch (?) If so,
let's respond to that message.
The end result of deleting the dead code is, of course, current
behaviour, which appears to be just fine and changing that would require
a lot of testing.
Another valid way of looking at it is that the end result of deleting the
dead code is either 1) the loss of the only signpost in the source code
that points to the bug or 2) a claim that there is no bug, just dead code.
Faced with those two possibilities, I made code live and sent an RFC
instead.
So perhaps try send_fault_signal() in the default branch, which will
also run die_if_kernel() if need be.
I tried that (Stan tested it). It turns out that usermode instruction
faults can also traverse that branch so /sbin/init just crashed.
If they happen together with a data fault, that branch will be used. If
exception fixup was successful, i.e. send_fault_signal() == -1, you
should not need to signal but continue to instuction fault handling?
AIUI, send_fault_signal() == -1 can only happen in supervisor mode, which
is not relevant here. (It is relevant to the PDMA crash but that remains a
red herring.) I still feel that there's little point in pursuing this
unless people (maintainers) agree that there's a bug to be fixed.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html