On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > On Sat, Dec 04 2010 at 2:18pm -0500, > Matt <jackdachef@xxxxxxxxx> wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the following two 2.6.37-rc commits: >> > >> > 1) 1de3e3df917459422cb2aecac440febc8879d410 >> > then >> > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc >> > >> > Then, depending on results of no corruption for those commits, bonus >> > points for testing the same commits but with Andi and Milan's latest >> > dm-crypt cpu scalability patch applied too: >> > https://patchwork.kernel.org/patch/365542/ >> > >> > Thanks! >> > Mike >> > >> >> Hi Mike, >> >> it seems like there isn't even much testing to do: >> >> I tested all 3 commits / checkouts by re-compiling gcc which was/is >> the 2nd easy way to trigger this "corruption", compiling google's >> chromium (v9) and looking at the output/existance of gcc, g++ and >> eselect opengl list > > Can you be a bit more precise about what you're doing to reproduce? > What sequence? What (if any) builds are going in parallel? Etc. > >> so far everything went fine >> >> After that I used the new patch (v6 or pre-v6), before that I had to >> >> replace WQ_MEM_RECLAIM with WQ_RESCUER >> >> and, re-compiled the kernels >> >> shortly after I had booted up the system with the first kernel >> (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179) >> the output of 'eselect opengl list' did show no opengl backend >> selected >> >> so it seems to manifest itself even earlier (ext4: call >> mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly >> and over time - >> I'm still currently running that kernel and posting from it & having tests run > > OK. > >> I'm not sure if it's even a problem with ext4 - I haven't had the time >> to test with XFS yet - maybe it's also happening with that so it more >> likely would be dm-core, like Milan suspected >> (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :( > > It'd be interesting to try to reproduce with that same kernel but using > XFS. I'll check with Milan on what he thinks would be the best next > steps. Ideally we'll be able to reproduce your results to aid in > pinpointing the issue. I think Milan will be trying to do so shortly > (if he hasn't started already -- using gentoo emerge, etc). > >> even though most of the time it's compiling I don't need to do much - >> I need the box for work so if my time allows next tests would be next >> weekend and I'm back to my other partition >> >> I really do hope that this bugger can be nailed down ASAP - I like the >> improvements made in 2.6.37 but without the dm-crypt multi-cpu patch >> it's only half the "fun" ;) > > Sure, we'll need to get to the bottom of this before we can have > confidence sending the dm-crypt cpu scalability patch upstream. > > Thanks for your testing, > Mike > I should have made it clear that the results I get are observed when using the kernels/checkouts *with* the dm-crypt multi-cpu patch, without the patch I didn't see that kind of problems (hardlocks, files missing, etc.) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel