On Thu, Dec 11, 2008 at 08:59:40AM +0000, Russell King wrote: > > > Hacky patch that mlock()s rpmdb's environment mmap(2)s, in order to > > > attempt to avoid spurious rpmdb corruption issues on Linux that seem > > > to be somehow related to pagein/pageout occuring. > > > > Ick. > > > > No. > > The relevent questions are: > > 1. which kernel version is this occuring with? > > 2. what device is the swap on? > > 3. which drivers are being used? This issue goes back to May 2007 or so, when I noticed db4 corruption when using rpm. I started digging into it, and ran into an issue with fsx-linux, which you reported to linux-arch@ here: http://marc.info/?l=linux-arch&m=118026300719763&w=2 Unfortunately, the issue seen with fsx-linux turned out to be unrelated to the rpm db4 corruption issue. I applied the hacky rpm db4 database mlock() patch (which was never meant to go upstream!) to see if that would make it go away, and it seems to have made it go away, since I haven't managed to reproduce it since and haven't had any reports about it since. Without the mlock patch, the corruption would happen even in qemu-system-arm, an environment in which cache aliasing effects don't exist, so I abandoned the theory of it being a cache aliasing issue at the time and theorised that somehow a dirty page was having its dirty data discarded and an older stale copy being swapped back in, although I've never been able to prove this -- after spending a week unsuccessfully trying to hunt it down at the time I haven't spent any more time on it since. (And everyone I mentioned this to seemed to agree that shared writeable mmap() is icky and yuck and booh and "hard to get right", and that didn't increase my motivation to look into it further either.) I don't even know if it's an issue anymore in recent kernels. I don't even know if it's (assuming that it _is_ indeed a kernel issue) an arch/arm issue or a kernel-wide issue that simply occurs more often on ARM because ARM systems generally have less memory and therefore generally have more memory pressure. (There's certainly enough reports of rpm database corruption on x86 as well, but in almost every report there are more factors involved, such as people Ctrl-C'ing and killing rpm processes as they are manipulating the database, etc.) thanks, Lennert _______________________________________________ fedora-arm mailing list fedora-arm@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-arm