On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > On Sat, Dec 04 2010 at 2:18pm -0500, > Matt <jackdachef@xxxxxxxxx> wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the following two 2.6.37-rc commits: >> > >> > 1) 1de3e3df917459422cb2aecac440febc8879d410 >> > then >> > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc >> > >> > Then, depending on results of no corruption for those commits, bonus >> > points for testing the same commits but with Andi and Milan's latest >> > dm-crypt cpu scalability patch applied too: >> > https://patchwork.kernel.org/patch/365542/ >> > >> > Thanks! >> > Mike >> > >> >> Hi Mike, >> >> it seems like there isn't even much testing to do: >> >> I tested all 3 commits / checkouts by re-compiling gcc which was/is >> the 2nd easy way to trigger this "corruption", compiling google's >> chromium (v9) and looking at the output/existance of gcc, g++ and >> eselect opengl list > > Can you be a bit more precise about what you're doing to reproduce? > What sequence? What (if any) builds are going in parallel? Etc. > >> so far everything went fine >> >> After that I used the new patch (v6 or pre-v6), before that I had to >> >> replace WQ_MEM_RECLAIM with WQ_RESCUER >> >> and, re-compiled the kernels >> >> shortly after I had booted up the system with the first kernel >> (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179) >> the output of 'eselect opengl list' did show no opengl backend >> selected >> >> so it seems to manifest itself even earlier (ext4: call >> mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly >> and over time - >> I'm still currently running that kernel and posting from it & having tests run > > OK. > >> I'm not sure if it's even a problem with ext4 - I haven't had the time >> to test with XFS yet - maybe it's also happening with that so it more >> likely would be dm-core, like Milan suspected >> (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :( > > It'd be interesting to try to reproduce with that same kernel but using > XFS. I'll check with Milan on what he thinks would be the best next > steps. Ideally we'll be able to reproduce your results to aid in > pinpointing the issue. I think Milan will be trying to do so shortly > (if he hasn't started already -- using gentoo emerge, etc). > >> even though most of the time it's compiling I don't need to do much - >> I need the box for work so if my time allows next tests would be next >> weekend and I'm back to my other partition >> >> I really do hope that this bugger can be nailed down ASAP - I like the >> improvements made in 2.6.37 but without the dm-crypt multi-cpu patch >> it's only half the "fun" ;) > > Sure, we'll need to get to the bottom of this before we can have > confidence sending the dm-crypt cpu scalability patch upstream. > > Thanks for your testing, > Mike > OK, before bed time I found some kind of corruption: running kernel is from commit: bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc the messages might be overseen - so they're difficult to notice: steps: 1) bootup 2) (might need to re-install graphics driver due to driver switch, in this case magic properties [or what's its name] didn't change so the kernel module still worked) 3) firing up 2 xterms, xload, xclock, gksu -> terminal -> firefox, nautilus --no-desktop, gnome-mplayer (playing mp3) 4) emerge -1 sys-devel/gcc (from one of the xterms) after emerge -1 sys-devel/gcc finished it displayed: >>> Auto-cleaning packages... portage: COUNTER for sys-devel/patch-2.6.1 was corrupted; resetting to value of 0 portage: COUNTER for sys-devel/patch-2.6.1 was corrupted; resetting to value of 0 (the COUNTER file normally should have a value, e.g.: cat /var/db/pkg/sys-devel/gcc-4.5.1-r1/COUNTER 20560) in this case it's empty: cat /var/db/pkg/sys-devel/patch-2.6.1/COUNTER (shows nothing) reference thread: http://forums.gentoo.org/viewtopic-t-836605-start-0.html it's solvable by re-install but in case of not-recoverable files (e.g. personal files) it would be critical -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel