Hi Vimal... I sense it would be a nice discussion for everybody @ kernelnewbies...so let's keep it on move :) On Sat, Sep 4, 2010 at 14:32, Vimal <j.vimal@xxxxxxxxx> wrote: > Sure. In fact, we removed our modifications and narrowed down the > crash to the following patches: > http://thread.gmane.org/gmane.linux.kernel/979066. Hmmm "CFS bandwith control"....and further reading reveal it tries to implement CPU time hard limit. Please CMIIW.... > More specifically, > the bug is in patches 3 and 4, since those are the only ones that deal > with enqueuing/dequeuing of tasks. IMO you already did a good job on finding "the needle in the haystack". OK so let's assume (temporarily) it's due to {en,de}queueing.... First thing that crosses my mind, it must be something about something not done atomically or protected by locks....or .... something isn't designed with quite large scalability in mind (read: multi processor or multi core). or maybe..it's about task migration between CPUs... > The patches (not written by us) provide a bandwidth mechanism for CFS > scheduler wherein a task group can be restricted to some percentage of > CPU time (i.e., rate limited). Nice summary, I got the same conclusion too.. > We have several observations: > > * The patched kernel crashes on an Intel Core i7 (8 threads, 12GB > RAM), at random times, when saturating all 8 cores with cpu intensive, > but rate-limited processes. > * The patched kernel hasn't crashed, yet, on an Intel Xeon (2 threads, > 2GB RAM, 64-bit). Likely, the patch's bug is a corner case...something that hadn't been thought to be anticipated. But it could be the other way around: it shows a bug in kernel. BFS (Con Kolivas' scheduler) sometimes shows more or less same thing... > * The time to crash is longer when we hot-unplugged 6 out of 8 threads > on the core i7 machine. w00t? ok...so, we can conclude that fewer threads means better situation, am I right? > * The crash happens (within 10 hours) only if we compile the kernel > with HZ=1000. Wonder why higher tick frequents contributes to this issue...something is fishy with the time slice accounting....or the way HPET/PIT is (re)programmed to do another time shot. > A tickless kernel gives rise to other problems, wherein > a "throttled" task took a long time to be dequeued. htop showed that > the task's status was R(unning), but the process's CPU exec time > didn't change and it also didn't respond immediately to SIGKILL. It > did respond, after a "long" (variable) time. AFAIK, tickless relies on HPET to do high precision time shot, so it might confirm my above suspicion. It respons after "long" time? aha.... signal is handled when a context is switching from interrupt to kernel or user mode, IIRC the fastest is when you enabled full preemption. So, this far: time slicing bug + HPET buggy reprogramming+buggy enqueueing(?). > I could explain in detail what tests we conducted, if that's useful. personally, i think it would be nice (and I welcome it) if you share it... > It was mainly starting and stopping a lot of CPU intensive (while(1);) > tasks that were rate limited. > > (a throttled task is one that has been dequeued since it has consumed > more cpu time than it was allotted.) > > Our hunch is that it's a race condition / deadlock somewhere. We fear > that the race condition might not occur/might take longer to surface > if we run it on an emulator, given our observations. sure..emulator is serializing things...it hardly does true multi processing... so using emulator might yield very different result. But still, maybe with Qemu-KVM, it still worth a shot. BTW, does this scheduler patches could be adapted to User Mode Linux architecture? if it can, IMHO it could be more promising platform for debugging purpose in this case. > We don't mind hitting the reset button every time it hangs, but if > you're suggesting that there's no way to debug the scheduler on a live > machine, then I guess qemu might be the only option. :( My knowledge is limited, so you're free to give your own point of view here. IMHO, we're really dealing with corner case of logic flaws...something that sometimes is hardly reproduced. I suggest to do very high load multithreaded stress testing over and over again and try to find the patern. I am sure eventually it could be found...only it takes time. -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ