Re: threads and fork on machine with VIPT-WB cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mon, Apr 19, 2010 at 6:26 PM, John David Anglin
> <dave@xxxxxxxxxxxxxxxxxx> wrote:
> > Hi Helge,
> >
> > On Tue, 13 Apr 2010, Helge Deller wrote:
> >
> >> Still crashes.
> >
> > Can you you try the patch below? =C2=A0The change to cacheflush.h is the =
> same
> > as before.
> 
> For the records, while setting up the wiki's TestCases page, it
> noticed that the initial large patch that you sent (see
> https://patchwork.kernel.org/patch/91525/ ) contained bits that
> weren't part of the split chunks you sent afterwards.
> 
> This patch (pte.d.2) seems to update some of those chunks and also
> contains bits that weren't either part of them.

The split chunks were mainly cleanups.  As far as I know, they are
obvious and provide no significant change in functionality.  I didn't
intentionally change any of the split hunks in patch4 (pte.d.2) although
this patch does touch some of the same files.  Possibly, the LWS fixes
should be split into two (obvious and UP locking).

Both the original patch and pte.d.2 were experimental.  Since I sent it,
I continued to experiment and reached a change that appears to fix the
minifail bug in a somewhat different manner than proposed by James.  However,
I'm still seeing some issues that appear to be PTE related (segmentation
faults in sh mainly).

At this point, I don't know why I still see problems.  I have one idea left
to try.  I also would like to implement copy_user_page with equivalent
aliasing.  My first attempt didn't work.  I just enabled code in pacache.S.

I have more or less reached the conclusion that our PTE/TLB management
is quite broken on SMP.  I tried James' patch but had trouble with segmention
faults on my rp3440 and a GCC build died early in stage 1 (make -j8
bootstrap).  I need to try it with a clean build.

I may be wrong but I think a flush in kmap(_atomic) won't work on SMP
because another user may just redirty the page when it is shared.

> That being said so that we do not loose track of potentially useful
> code. Though maybe kyle has all of this sorted out already and I'm
> just unable to figure it out myself ;-)

I don't think there's a clear path.  I've come to realize that I don't
understand what's required of the higher level code.  The documentation
doesn't help much.  Looking at other archs provides some clues.  I've
looked at ia64 a bit (see for example TLB shootout support and retry in
TLB miss handler).

Regarding the wiki, it's a useful summary.  However, #561203 (minifail
bug) is not a "Futex wait failure".  We may have futex bugs, but I'm not
aware of a testcase.  The minifail bug is a "Threads and fork" problem
arising from cache corruption.  Mainly, copy_user_page is broken when
copying memory shared by more than one process.  There are also issues
in PTE/TLB management on SMP systems.  Probably, the vfork/execve bug
is caused by the same problem.

Dave
-- 
J. David Anglin                                  dave.anglin@xxxxxxxxxxxxxx
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux