----- "Bob Montgomery" <bob.montgomery@xxxxxx> wrote: > Well, I've been picking at this some more. PID 1 is in the system, but > crash misses it when it's building its table of tasks in > refresh_hlist_task_table_v2(). In fact, on my particular dump, it loses > track of at least 3 processes. > > The attached patch changes that behavior. It has to do with collisions > on the pid_hash table where an early item on the chain has a NULL task > pointer which causes the code to ignore subsequent items on that > collision chain. I'm not sure what it means when the tasks[0].first > pointer in the struct pid is NULL, but that's what triggers the problem > and keeps crash from following the pid_chain pointer to the next struct > pid. I am not confident that this whole area is correct yet, just > closer to correct than it was. > > These now appear in the ps output: > > crash-5.0.6-fix2> ps 1 8144 998 > PID PPID CPU TASK ST %MEM VSZ RSS COMM > 1 0 1 ffff81012bd3c780 IN 0.0 6124 688 init > 8144 6257 0 ffff81011996e140 RU 0.7 108876 35016 mirrorclient > 998 11 0 ffff81012a9cd780 IN 0.0 0 0 [fc_dl_1] > > where before: > > crash-5.0.6-fix> ps 1 8144 998 > ps: invalid task or pid value: 1 > > ps: invalid task or pid value: 8144 > > ps: invalid task or pid value: 998 > > This might have been some transition behavior of the pid hash design in > the kernel, because I've got two dumps based on 2.6.18 kernels that show > missing processes (this one had 3 out of 532, the other had 1 out of > 146), but my new patched crash doesn't reveal any missing processes in > 2.6.29 and newer dumps (I checked 4 dumps, with process counts ranging > from 362 to 926). Only my recent 2.6.18 dump was lucky enough to be > missing PID 1, with me being lucky enough to try crash's mount command, > or we'd still not know about it :-) Yeah, I agree that it must be catching a kernel transition. And it's probably not being seen in your 2.6.29-and-newer dumps because 2.6.24-and-later kernels use refresh_hlist_task_table_v3(). > The patch is simple, but has lots of lines because I moved the indent. The patch looks reasonable and safe. I'll run it against my stable of sample dumpfiles to see if I can find one... Anyway, nice catch Bob -- and thanks again for tracking down yet another gnarly issue, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility