This is a multi-part message in MIME format. ------=_NextPartTM-000-161c9ae9-473b-4794-9c91-9ae3d17f282c Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hello, Stephen, Thanks for responding so quickly. |Hi, | |On Wed, Jun 05, 2002 at 03:07:16PM +0530, BALBIR SINGH wrote: | |> I am running Linux 2.4.7-10, 2.4.18-4 and 2.4.19-pre9, I |> see the following oops quite often (mainly on 2.4.19-pre9 with |> kdb). All the kernels I use have the kdb patch installed. | |Well, 2.4.7-10 and 2.4.18-4 both look like Red Hat kernel |releases, and those do _not_ have kdb installed. | Yes! I downloaded the 2.4.7-10 and got a suitable kdb patch. And I have KDB for 2.4.19-pre9, but I am looking for a kdb or any debugger patch that will apply well on 2.4.18-4. Redhat used to include ikd with their debug kernels earlier, not sure if that is correct even now. |> kmem_cache_alloc (offset 0x125) |> get_unused_buffer_head |> journal_write_metadata_buffer |> journal_commit_transaction |> kjournald |> kernel_thread | |This is not an oops. It might be a kdb backtrace, though. |Could you post the actual oops? That contains much more |information than I've got here. However, the backtrace you |showed is inside the VM, not in ext3, so it doesn't appear to |be necessarily an ext3 problem. Absolutely, it is difficult to get the OOPS with kdb running, sometimes on doing a go on the kdb> prompt causes more panics and the real panic does not appear in the dmesg or /var/log/messages I will try and capture all the information on the next oops and send it to you, I do not have the information, I lost it on booting, sorry! | |> kmem_cache_alloc dis |> |> xchg %eax, (%ebx) |> cmp $0x5a2cf071, %eax (where did a value like that come |from?) je .... |> ud2a - (looks like assembly for BUG()) | |Where in this sequence is the EIP that is failing? What are |the register contents? | I turned on CONFIG_DEBUG_SLAB and I realized using (gcc -S -g) that this was in the slab checking (POISON part). |> One more thing I have seen is a problem with the following code |> |> do { |> new_bh = get_unused_buffer_head(0); |> if (!new_bh) { |> printk (KERN_NOTICE __FUNCTION__ |> ": ENOMEM at |get_unused_buffer_head, " |> "trying again.\n"); |> current->policy |= SCHED_YIELD; |> schedule(); |> } |> |> when get_unused_buffer_head fails, the call to printk would |eventually |> want to flush the contents to /var/log/messages and if |> /var/log/messages happens to be on a journalled file system, well it |> kind of gets recursive. | |No, the printk buffer will wrap (discarding data if necessary) |if klogd can't dump the information from it fast enough, so |there's no deadlock. | Pardon me, but this might be just my impression. Ext3 these days has been very unstable on my system (I ran h/w tests to ensure it is not a h/w problem, I checked my SCSI disk using the BIOS and the memory using memtest). The code mentioned above causes my system to hang, any file system operation hangs on entering kdb I find the following 1. The system has called bdflush - shrink_cache, et.al - kmem_cache_reap 2. The system is doing a printk (the code mentioned above) The system seems to spin around in the code path mentioned above. In the case of the printk buffer wrapping. This is kind of what I think happens 1. Multiple printks queued for writing out by klogd. 2. klogd tries to write out the data calls file system specific code 3. Ext3 runs out of buffer heads, calls printk - now even if the buffer wrapped around, it has another printk in its queue, we go back to step 2. Like you said, it might be the VM that is broken. Have you tried running a some tests on ext3 with DEBUG_EXT3 and CONFIG_DEBUG_SLAB turned on? I will try IOZONE with these options and see if I can find something. |Cheers, | Stephen | ------=_NextPartTM-000-161c9ae9-473b-4794-9c91-9ae3d17f282c Content-Type: text/plain; name="Wipro_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Wipro_Disclaimer.txt" **************************Disclaimer************************************ Information contained in this E-MAIL being proprietary to Wipro Limited is 'privileged' and 'confidential' and intended for use only by the individual or entity to which it is addressed. You are notified that any use, copying or dissemination of the information contained in the E-MAIL in any manner whatsoever is strictly prohibited. *************************************************************************** ------=_NextPartTM-000-161c9ae9-473b-4794-9c91-9ae3d17f282c--