-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! This is going to be a bit longer, but may be interesting to many Nokia 770 users as I suspect that this problem is present on all 770s: A few weeks ago I had a very bad spontaneous crash of my 770 making it unbootable. The progress bar never showed up. I could reflash, but got suspicious about what might have caused it. I started searching for a memory checker and found: http://pyropus.ca/software/memtester/ It compiles fine under scratchbox, here are my binaries: http://freenet-homepage.de/tvogel/memtester-4.0.7-bin.tar.bz2 As I tested my device by running "./memtester 24" as root, I observed, that I got memory corruption when there was WLAN activity: while scanning for networks as well as when data is actually transmitted. The problem does not always show up as it depends on what memory blocks memtester is assigned by the kernel. Also the adresses where the errors occur vary, but as user space processes live in virtual memory space, addresses do not have a fixed mapping to physical memory anyway. After that, I wanted to find out where the problems actually come from and how much memory is affected. I found the Running Unix Memory Tester (rumt-0.2) from: http://www.normalesup.org/~george/comp/rumt/ It does compile under scratchbox but needs an additional patch in order to work correctly on the 770. Find the patch here: http://freenet-homepage.de/tvogel/rumt-n770.patch and my binaries here: http://freenet-homepage.de/tvogel/rumt-bin.tar.bz2 Before I describe how to reproduce, here are my results: Depending on the memory location to which the modules umac.ko and cx3110x.ko get loaded, exactly two consecutive bytes at fixed physical locations in memory get overwritten by zeroes everytime there is WLAN activity: On a vanilla NOKIA770_2006SE_3.2006.49-2_PR_MR0, the modules get loaded at (cat /proc/modules): cx3110x 51420 0 - Live 0xbf03f000 umac 253316 1 cx3110x, Live 0xbf000000 In this case, the two bytes are at physical location 0x1304b8b4 and 0x1304b8b5 (these addresses include an offset of 0x10000000 - see /proc/iomem). When booting the same OS from an ext2 formatted MMC, then the modules are: cx3110x 51420 0 - Live 0xbf04e000 umac 253316 1 cx3110x, Live 0xbf00f000 ext2 43524 1 - Live 0xbf003000 mbcache 7716 0 - Live 0xbf000000 I.e. due to the two extra modules, umac.ko and cx3110x.ko are shifted by 0xf000. And surprise, surprise, the corrupted bytes also get shifted by 0xf000 to 0x1305a8b4 and 0x1305a8b5. Of course, I'd be very interested to know if this only occurs on my device or if this is a common problem, so I'd be happy if some of you could try to reproduce it. This procedure can be used: - - open two root shells on your 770 - - start WLAN on the 770, flood ping "ping -f" your 770 in order to create network traffic - - in the first shell, start memtester starting with a size that shows the corruption (the first argument is the size in MB) - - successively reduce the size until you don't see corruption: This makes it likely, that the next alloc of 1 MB will get the block with the bad bytes - - let memtester run and now in the second shell, start "urumt -p 256": This will allocate 1 MB of memory, locate its physical addresses in /dev/mem and start testing. You'll get bit-precise location information on which bits get corrupted into which direction: + (1->0) or - (0->1). (I used this procedure because memtester is much faster than urumt.) I'd be interested, if you also find this problem. If so, you can try using my workaround: My idea was to write a programm that tries to allocate the bad memory block, lock it and then just sleep forever. This would save other processes from stepping into the trap. You can find my source code at: http://freenet-homepage.de/tvogel/blockbad.c or the binary at: http://freenet-homepage.de/tvogel/blockbad The programm takes as argument the memory page to block. If urumt reported 123ef:8bc, strip off the leading 1 and the last 3 digits, i.e. use 0x23ef in this example. The programm will always allocate 32MB RAM in order to search for the block. This is currently hardcoded. After the block is found, the other blocks are freed up again. Of course, you should stop memtester and urumt before that. If this works for you, you might consider starting blockbad 0x23ef at the end of /etc/init.d/minircS. Then you can check with "ps" if blockbad is running. If so, it found and allocated the suspicious memory block. If not, it was out of luck and didn't get that block assigned by the kernel. Very interested in any feedback, Tilman PS. I had problems with some applications on my 770 (file manager and bookmarks crashed) and it turned out the reason was a corrupted library file (libhildonfm.so.1.0.0) which had erroneous zeroes at exactly the suspicious offset 0x8b4 and 0x8b5! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG5xFL9ZPu6Yae8lkRAqD5AJ9UF5Q4Qk5lHU76hZxX33/X3HHEbwCdHhk6 o0HGe4YcKFjhhV0CMSOUHLo= =88Zi -----END PGP SIGNATURE-----