Re: 'bad page state' error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Got to work today, reproduced the crash error on a second VM in a different cluster under RHEL6.4, with the 373 kernel patch set.

Doing full (non-canceled) file copies is generally okay, doesn't seem to crash anything. Canceling still makes it generate a page state error.

I ran a torrent session, which downloaded to 99%, and then the VM crashed. During the torrent transfer, no 'bad page state' errors, but once the client hit the final few bits, VM crashed with the attached screenshot error dump on console.

It also looks like caching isn't working properly with torrented data. 'cp' works fine retrieving the data from the cache, but torrent chunks seem to be getting pulled from the CIFS share, not the cache. Eventually the disk fills up and the VM crashes instantly after mounting the share.

So there's probably some problem with identifying whether a bit of data is in the cache or not.

I should probably just shelve this project for now.

On 6/19/2013 11:56 AM, Rob Bos wrote:
On 6/19/2013 3:03 AM, David Howells wrote:
Rob Bos <rbos@xxxxxx> wrote:

I applied the 373 patchset and compiled a version with CIFS_FSCACHE enabled.

Same problem. Got ~2GiB into a cp before it started generating 'bad page
state' errors.
Okay, thanks.  Looks like there's still another bug in there:-(

When you say you git 2GiB into a cp, were you actually copying a file of that
size?  Or was this cumulative?

Bunch of small files. Matlab installer, specifically.

I was working on duplicating the error messages this morning by copying files of known size in a controlled fashion, and crashed the VM (another process was reading data from it at the time). Had the VMware guy capture a screenshot of the oops, attached, but alas, no scrollback.

When I get some time I'll write up a quick script to repeatedly write/read files of a fixed size and stop when an error is found in dmesg, under more controlled circumstances. That might tell us if it's a certain file access doing it, or a certain amount of writes, or reads.


If I could work out how to reproduce this reliably, there's a good chance I'll
be able to fix it.

David


Attachment: 2013-06-23_115403.png
Description: PNG image

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]
  Powered by Linux