Re: NFS corruption, fixed by echo 1 > /proc/sys/vm/drop_caches -- next debugging steps?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 15, 2017 at 7:00 AM, Manuel Lauss <manuel.lauss@xxxxxxxxx> wrote:
>
> On Wed, Mar 15, 2017 at 10:25 AM, Ralf Baechle <ralf@xxxxxxxxxxxxxx> wrote:
>>
>> On Mon, Mar 13, 2017 at 09:47:57AM +0000, James Hogan wrote:
>>
>> > >
>> > > Note that the corruption is different across reboots, both in the size
>> > > of the corruption and the location. I saw 1900~ and 1400~ byte
>> > > sequences corrupted on separate occasions, which don't correspond to
>> > > the system's 16kB page size.
>> > >
>> > > I've tested kernels from v3.19 to 4.11-rc1+ (master branch from
>> > > today). All exhibit this behavior with differing frequencies. Earlier
>> > > kernels seem to reproduce the issue less often, while more recent
>> > > kernels reliably exhibit the problem every boot.
>> > >
>> > > How can I further debug this?
>> >
>> > It smells a bit like a DMA / caching issue.
>> >
>> > Can you provide a full kernel log. That might provide some information
>> > about caching that might be relevant (e.g. does dcache have aliases?).
>>
>> The architecture of the BCM1250 SOC used for the BCM91250 boards are
>> fully coherent, S-cache and D-cache are physically indexed and tagged.
>> Only the VIVT (plus the usual ASID tagging) I-cache leaves space for
>> software to screw up cache management but that shouldn't matter for this
>> case, so I suggest to start looking into this from the NFS side.
>
>
> I did Matt's tests on Alchemy (VIPT caches) with kernels 3.18 to 4.11-rc
> against
> an x86 4.9.15 host, and did not see any problems.   Given Ralf's comment
> about the BCM1250 caches, maybe you have bad hardware (BCM board or
> network) ?

I certainly cannot rule that possibility out. If that is the case, I
would like to be sure of it -- see a failure in memtester or something
for instance. Any suggestions? (I have run memtester and never found
anything)

For what its worth, did you determine the cause of the NFS corruption
you reported [1]?

[1] https://www.spinics.net/lists/mips/msg44006.html




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux