On 11/17/2014 08:56 PM, Maciej W. Rozycki wrote:
On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
That doesn't appear to have anything to do with ptrace(2) and
cache-coherency issues seen around software breakpoints, and the buggy
74K is the only system of all that we have that shows that problem. And
the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
actually needed here, or maybe a more lightweight alternative fix, but
an item that addresses HIGHMEM support is clearly irrelevant.
Again, the 74K errata has deal only if there is two mappings and one of that
mappings switches to the same physaddr. In other words - TLB change is needed
(some another conditions are also needed). It has nothing with generic cache
aliasing. Switching ON cache aliasing support just masks the issue behind
massive cache flushes.
For the erratum to trigger do the two mappings absolutely have to go
through the TLB or can one of them be via the TLB and the other one via
a fixed mapping by the means of KSEG0?
One of mapping can be KSEG0 fixed.
This will be the case here.
It was considered as not - KSEG0 mapping is not changed and user mapping
actually is stable and is not changed during ptrace(2).
But change of one TLB is required (with accesses via both and with race
condition with other mapping).
It is a very specific case and there are only two places which may have
impact - CoW and page clearing before it is submitted to user.
And I understand what the consequence of setting MIPS_CACHE_ALIASES is,
but I also insist that correctness is more important than performance,
so we need to make sure that the kernel performs reliably even if that
comes at a cost. And the cost may be higher than necessary at the
beginning, but that will be the right starting point for further
improvements.
I suspect you may have a problem with something else but not 74K errata
and switching on cache aliasing flushes just hides a problem. Why you
see it on 74K only - because it is out-of-order CPU. Did you test on
proAptiv or newest P5600, it has a similar out-of-order design?
If you have problem with ptrace(2) then it points to incorrect result of
copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM patch
there takes care about address aliasings, so I assumed I took that case into
account too but something may changes. I think it has sense to put the check
of cpu_has_vtag_dcache in copy_to_user_page() - it definitely will enforce
cache flashing after ptrace() write aka __access_remote_vm(..., true) and it
doesn't harm the rest of system. And retest.
But it is needed to do after http://patchwork.linux-mips.org/patch/8459/ fix,
without it it is futile.
I think burying fixes for ordinary accesses among HIGHMEM pieces does
not really help, would you be able to split off pieces relevant for
non-HIGHMEM configurations, such as those for the 74K workaround, from
your big change? I think it would make it easier to get this part
accepted, and the remaining pieces would then shrink too, also making
them easier to review and accept.
It is related with HIGHMEM because (at least 3 years ago) the HIGHMEM
had a high possibility to hit it.
However, it was a series of some stuff which was sequentially ported
from 2.6.32.15 to 2.6.32.9 and so on but never passes upstream and
because of that it suffers a serious change in time.
- Leonid.