Hi , Following patch restricts TREE_CPU RCU implementation only for !PREEMPT SMP kernel. http://git.linux-mips.org/?p=linux.git;a=commit;h=687d7a960aea46e016182c7ce346d62c4dbd0366 CONFIG_TREE_PREEMPT_RCU option seems to be not working for SMTC kernel ( which will be only available RCU implementation for SMTC kernel from 2.6.37 onwards) . With no forced preemption and selecting TREE_CPU I am able to boot further to the hang that I have reported. Thanks Anoop On Sat, 2011-01-01 at 00:42 -0800, Kevin D. Kissell wrote: > At this point the logical thing to do would seem to look at your kernel > image and disassemble smtc_ipi_replay(), which is where the EPC of VPE 0 > shows the last exception to have been taken. That's a critical SMTC > routine that gets called whenever an xxx_irq_restore() enables > interrupts, so that virtual per-TC IPI interrupts that were posted while > the TC had interrupts disabled can be handled deterministically. As I > mentioned in an earlier message, there was some cleanup work from David > Howell that changed a number of irq management-related function names > and prototypes across all architectures, which went into linux-mips.org > at very roughly the time of the breakage. The SMTC overlay over the irq > implementation has been pretty robust, but it's written in a perhaps > doomed attempt to be both efficient and using a maximum amount of common > code with the general case. A mechanical or semi-mechanical change > could conceivably have broken things. > > Regards, > > Kevin K. > > > On 12/31/2010 4:27 AM, Anoop P A wrote: > > Hi , > > > > Kernel hangs on stop_machine call. Please find mt reg dump below. > > Another important observation is even though 2.6.33 kernel + stackframe > > patch well passes calibration hang , I am still unable boot in to a > > initramfs root ( verified ramfs working with VSMP). So it looks like > > still some issue to fix between 2.6.32 and 2.6.33 . > > ######################## Log ########################### > > > > === MIPS MT State Dump === > > -- Global State -- > > MVPControl Passed: 00000005 > > MVPControl Read: 00000004 > > MVPConf0 : a8008406 > > -- per-VPE State -- > > VPE 0 > > VPEControl : 00008000 > > VPEConf0 : 800f0003 > > VPE0.Status : 11004201 > > VPE0.EPC : 8010dc54 smtc_ipi_replay+0xcc/0x108 > > VPE0.Cause : 50804000 > > VPE0.Config7 : 00010000 > > VPE 1 > > VPEControl : 00068006 > > VPEConf0 : 80cf0003 > > VPE1.Status : 11008301 > > VPE1.EPC : 801022a0 r4k_wait+0x20/0x40 > > VPE1.Cause : 50800000 > > VPE1.Config7 : 00010000 > > -- per-TC State -- > > TC 0 (current TC with VPE EPC above) > > TCStatus : 18102000 > > TCBind : 00000000 > > TCRestart : 803fa19c printk+0xc/0x30 > > TCHalt : 00000000 > > TCContext : 00000000 > > TC 1 > > TCStatus : 18902000 > > TCBind : 00200000 > > TCRestart : 801022a0 r4k_wait+0x20/0x40 > > TCHalt : 00000000 > > TCContext : 00140000 > > TC 2 > > TCStatus : 18902000 > > TCBind : 00400000 > > TCRestart : 801022a0 r4k_wait+0x20/0x40 > > TCHalt : 00000000 > > TCContext : 00280000 > > TC 3 > > TCStatus : 18902000 > > TCBind : 00600000 > > TCRestart : 801022a0 r4k_wait+0x20/0x40 > > TCHalt : 00000000 > > TCContext : 003c0000 > > TC 4 > > TCStatus : 18902000 > > TCBind : 00800001 > > TCRestart : 8010229c r4k_wait+0x1c/0x40 > > TCHalt : 00000000 > > TCContext : 00500000 > > TC 5 > > TCStatus : 18902000 > > TCBind : 00a00001 > > TCRestart : 8010229c r4k_wait+0x1c/0x40 > > TCHalt : 00000000 > > TCContext : 00640000 > > TC 6 > > TCStatus : 18902000 > > TCBind : 00c00001 > > TCRestart : 8010229c r4k_wait+0x1c/0x40 > > TCHalt : 00000000 > > TCContext : 00780000 > > Counter Interrupts taken per CPU (TC) > > 0: 0 > > 1: 0 > > 2: 0 > > 3: 0 > > 4: 0 > > 5: 0 > > 6: 0 > > 7: 0 > > Self-IPI invocations: > > 0: 12 > > 1: 0 > > 2: 0 > > 3: 0 > > 4: 0 > > 5: 5 > > 6: 4 > > 7: 0 > > IPIQ[0]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[1]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[2]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[3]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[4]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[5]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[6]: head = 0x0, tail = 0x0, depth = 0 > > IPIQ[7]: head = 0x0, tail = 0x0, depth = 0 > > 0 Recoveries of "stolen" FPU > > =========================== > > > > ################################################################ > > > > Thanks > > Anoop > > > > On Tue, 2010-12-28 at 00:43 -0800, Kevin D. Kissell wrote: > >> I took a quick look last night, and the only thing that looked vaguely > >> dangerous in changes since the timer changes I alluded to earlier was > >> the global naming cleanup of irq-related function names that David > >> Howell submitted. The diff didn't look dangerous in itself, but some of > >> the definitions are nested subtly for SMTC to maximize the amount of > >> common code, and I could imagine something getting lost in translation > >> there. If that were really the problem, it would of course affect much > >> more than just the timer subsystem, but early in the boot process, > >> timers are pretty much the only interrupts that have to be handled > >> correctly. > >> > >> I'm travelling today, but will take a look at timekeeping_notify() > >> tomorrow or the next day... > >> > >> /K. > >> > >> On 12/28/10 12:19 AM, Anoop P A wrote: > >>> Hi, > >>> > >>> I had a glance into the code diff without notice of any suspect-able > >>> code . > >>> Tracing the hang showed that it is getting hanged in timekeeping_notify > >>> function. > >>> > >>> Thanks, > >>> Anoop > >>> > >>> PS: I may not be available until Thursday > >>> > >>> On Mon, 2010-12-27 at 22:49 +0530, Anoop P A wrote: > >>>> Hi Kevin, > >>>> > >>>> It is very unlikely that the patch you pointed has any impact on the the > >>>> hang I am seeing. The patch you have mentioned got into kernel around > >>>> 2.6.32 timeframe. I am able to boot both 2.6.32 and 2.6.33 kernel ( + > >>>> stackframe patch) . > >>>> > >>>> Hi Stuart, > >>>> > >>>> I haven't got much time to spend on this today. > >>>> > >>>> I had got 2.6.36-stable(+ stack frame patch) booting last day and I have > >>>> observed hang issue with 2.6.37-rc1 ( Same as rc6 and current git head) > >>>> > >>>> So probably some patches in 2.6.37 branch introduced this hang. > >>>> > >>>> Hopefully I will get some free slot tomorrow so that I can look into > >>>> code diff . > >>>> > >>>> Thanks > >>>> Anoop > >>>> > >>>> On Mon, 2010-12-27 at 09:49 -0600, STUART VENTERS wrote: > >>>>> Kevin, > >>>>> > >>>>> Outstanding, sometimes it's better to be lucky than good. > >>>>> > >>>>> > >>>>> Anoop, > >>>>> > >>>>> Maybe we can get lucky again. > >>>>> > >>>>> If you can isolate the .33 works/.37 works_not bug to a specific pair of versions, > >>>>> I'll be happy to do another diff. > >>>>> > >>>>> > >>>>> Hope you'll have had a good Christmas as well. > >>>>> We've had snow in Alabama since Christmas eve! > >>>>> > >>>>> > >>>>> Regards, > >>>>> > >>>>> Stuart > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Kevin D. Kissell [mailto:kevink@xxxxxxxxxxxxx] > >>>>> Sent: Friday, December 24, 2010 5:34 PM > >>>>> To: Anoop P A > >>>>> Cc: STUART VENTERS; Anoop P.A.; linux-mips@xxxxxxxxxxxxxx > >>>>> Subject: Re: SMTC support status in latest git head. > >>>>> > >>>>> > >>>>> Ah, well, at least we have a stackframe.h fix that preserves David's > >>>>> performance tweak for the deeper pipelined processors. In looking for > >>>>> this, I did notice that someone did some modification to the SMTC clock > >>>>> tick logic that I was skeptical had ever been tested. If you've still > >>>>> got that kernel binary handy, you might check to see if it boots with > >>>>> maxtcs=1 maxvpes=1, maxtcs=2 maxvpes=1, and/or maxtcs=2 maxvpes=2. > >>>>> > >>>>> Oh, yes, and Merry Christmas one and all! > >>>>> > >>>>> Regards, > >>>>> > >>>>> Kevin K. > >>>>> > >>>>> On 12/24/10 8:02 AM, Anoop P A wrote: > >>>>>> On Fri, 2010-12-24 at 06:53 -0800, Kevin D. Kissell wrote: > >>>>>>> Excellent! Now, does the attached patch (relative to 2.6.37.11) also > >>>>>>> fix things, while preserving the other fixes and performance enhancements? > >>>>>>> > >>>>>> I have tested that patch with 2.6.37 branch it well passes calibration > >>>>>> loop but hangs after switching to mips closource > >>>>>> > >>>>>> TC 6 going on-line as CPU 6 > >>>>>> Brought up 7 CPUs > >>>>>> bio: create slab<bio-0> at 0 > >>>>>> SCSI subsystem initialized > >>>>>> Switching to clocksource MIPS > >>>>>> > >>>>>> I Presume this is a different issue as restoring older file didn't help > >>>>>> much to get rid of this hang. > >>>>>> > >>>>>> diff --git a/arch/mips/include/asm/stackframe.h > >>>>>> b/arch/mips/include/asm/stackframe.h > >>>>>> index 58730c5..7fc9f10 100644 > >>>>>> --- a/arch/mips/include/asm/stackframe.h > >>>>>> +++ b/arch/mips/include/asm/stackframe.h > >>>>>> @@ -195,9 +195,9 @@ > >>>>>> * to cover the pipeline delay. > >>>>>> */ > >>>>>> .set mips32 > >>>>>> - mfc0 v1, CP0_TCSTATUS > >>>>>> + mfc0 v0, CP0_TCSTATUS > >>>>>> .set mips0 > >>>>>> - LONG_S v1, PT_TCSTATUS(sp) > >>>>>> + LONG_S v0, PT_TCSTATUS(sp) > >>>>>> #endif /* CONFIG_MIPS_MT_SMTC */ > >>>>>> LONG_S $4, PT_R4(sp) > >>>>>> LONG_S $5, PT_R5(sp) > >>>>>> > >>>>>> > >>>>>>> /K. > >>>>>>> > >>>>>>> On 12/24/10 6:39 AM, Anoop P A wrote: > >>>>>>>> Hi Kevin, Stuart , > >>>>>>>> > >>>>>>>> Woohooo You guys spotted !. > >>>>>>>> > >>>>>>>> http://git.linux-mips.org/?p=linux.git;a=commit;h=d5ec6e3c seems to be > >>>>>>>> the culprit > >>>>>>>> > >>>>>>>> Once I restored previous version of stackframe.h 2.6.33-stable started > >>>>>>>> booting !. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Anoop > >>>>>>>> > >>>>>>>> On Fri, 2010-12-24 at 04:32 -0800, Kevin D. Kissell wrote: > >>>>>>>>> Thank you, Stuart! I've spotted some definite breakage to SMTC between > >>>>>>>>> those versions. In arch/mips/include/asm/stackframe.h, someone moved > >>>>>>>>> the store of the Status register value in SAVE_SOME (line 169 or 204, > >>>>>>>>> depending on the version) from two instructions after the mfc0 to a > >>>>>>>>> point after the #ifdef for SMTC, presumably to get better pipelining of > >>>>>>>>> the register access. Unfortunately, the v1 register is also used in the > >>>>>>>>> SMTC-specific fragment to save TCStatus, so the Status value gets > >>>>>>>>> clobbered before it gets stored. This will eventually result in the > >>>>>>>>> Status register getting a TCStatus value, which has some bits on common, > >>>>>>>>> but isn't identical and sooner or later Bad Things will happen. > >>>>>>>>> > >>>>>>>>> I'm a little surprised this wasn't caught by visual inspection of the patch. > >>>>>>>>> > >>>>>>>>> Possible solutions would include reverting the store of the CP0_STATUS > >>>>>>>>> value to the block above the #ifdef, or, to retain whatever performance > >>>>>>>>> advantage was obtained by moving the store downward, to use v0/$2 > >>>>>>>>> instead of v1/$3, as the staging register for the TCStatus value. I'd > >>>>>>>>> lean toward the second option, but I'm not in a position to test and > >>>>>>>>> submit a patch just now. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> Kevin K. > >>>>>>>>> > >>>>>>>>> On 12/23/10 1:09 PM, STUART VENTERS wrote: > >>>>>>>>>> Kevin, > >>>>>>>>>> > >>>>>>>>>> I'm not sure if it's useful, > >>>>>>>>>> but finally I got the time to look at the two kernel versions Anoop pointed out. > >>>>>>>>>> works 2.6.32-stable with patch 804 > >>>>>>>>>> works_not 2.6.33-stable > >>>>>>>>>> > >>>>>>>>>> greping for files with CONFIG_MIPS_MT_SMTC > >>>>>>>>>> and looking for timer interrupt related stuff found the following differences: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> arch/mips/include/asm/irq.h > >>>>>>>>>> arch/mips/kernel/irq.c > >>>>>>>>>> do_IRQ > >>>>>>>>>> > >>>>>>>>>> arch/mips/include/asm/stackframe.h > >>>>>>>>>> SAVE_SOME SAVE_TEMP get/set_saved_sp > >>>>>>>>>> > >>>>>>>>>> arch/mips/include/asm/time.h > >>>>>>>>>> clocksource_set_clock > >>>>>>>>>> > >>>>>>>>>> arch/mips/kernel/process.c > >>>>>>>>>> cpu_idle > >>>>>>>>>> > >>>>>>>>>> arch/mips/kernel/smtc.c > >>>>>>>>>> __irq_entry > >>>>>>>>>> ipi_decode > >>>>>>>>>> SMTC_CLOCK_TICK > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Enclosed are the two subsets of files for a more expert look. > >>>>>>>>>> > >>>>>>>>>> I'll try to look in more detail after Christmas. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> Stuart > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > > >