On 09/21/2014 02:57 AM, Dave Anderson wrote:
Oh, Ok.----- Original Message -----On 09/20/2014 03:15 AM, Dave Anderson wrote:----- Original Message -----Hello Pan, I've updated the patch I attached yesterday with a change that caches the most-recent tgid search result. From ~70% to ~90% of the time, either the last tgid entry or the very next one in the tgid_array is the one being searched for, so it's not necessary to call bsearch() every time. "help -t" will show the cache-hit statistics. Thanks, DaveHello Pan, This patch as written needs to be made less restrictive for use on a live system. When running on a live system that has many tasks constantly forking/exec'ing, the "ps" command may occasionally fail like so: crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 ffffffff81c13440 RU 0.0 0 0 [swapper/0] 0 0 1 ffff88021282d330 RU 0.0 0 0 [swapper/1] > 0 0 2 ffff88021282dac0 RU 0.0 0 0 > [swapper/2] 0 0 3 ffff88021282e250 RU 0.0 0 0 [swapper/3] 1 0 1 ffff880212828000 IN 0.0 50140 3120 systemd 2 0 3 ffff880212828790 IN 0.0 0 0 [kthreadd] ... [ cut ] ... 7578 27670 0 ffff8801f45e3c80 DE 0.0 0 0 cc 7622 27668 1 ffff880210ee3c80 ZO 0.0 0 0 info 7629 27667 1 ffff8801075bd330 DE 0.0 0 0 rev 7631 27680 0 ffff8801075bf170 ZO 0.0 0 0 printenv 7635 27685 3 ffff880108bbe9e0 ZO 0.0 0 0 ypwhich ps: bsearch for tgid failed: task: ffff880210ee6250 tgid: 7654 crash> Without this patch, the search for the matching tgid would not generate an error at all, but just quietly continue. The problem is due to the task.tgid may change on a live system, or more likely, the task itself may have been re-used. I would like to fix it simply ignoring tgid bsearch failures on live systems, and just use the RSS stats stored in the per-tgid mm_struct. Does that work for you? Dave .ok! But I don't understand the meaning of " fix it simply ignoring tgid bsearch failures on live systems, and just use the RSS stats stored in the per-tgid mm_struct. ", if tgid may be changed, the tgid_array is useless on live systems.Well, in this case, it may be true for a particular task if the task struct had been re-used in between the time the arrays were created and the time that the "ps" command gets around to reading and displaying its various statistics. And so the command may read invalid data w/respect to that task. But let's be clear -- that kind of behavior is, and always has been, an unavoidable circumstance when running the crash utility on live systems, or when looking at a "live" dump. It's not just the "ps" command, but any command that displays data that is subject to the "shifting sands" syndrome, where the kernel data is constantly being modified while the crash command is running. So the idea is to not just cancel the whole command with an error(FATAL...) if such an anomoly occurs on a live system.And what is the "RSS stats stored in the per-tgid mm_struct" used for?Sorry -- I meant to quietly skip the checking of the other tasks in the task group, and simply use whatever is stored in the mm_struct pointed to by the original task. Without your patch, if the tgid was not found, the command would just continue. With your patch applied, it would be OK do the error(FATAL) in the case of a static dumpfile. But in the case of a live system (or live dump), it's not worth killing the command at that point. Clear? DaveMore clearly, please. thanks, Pan. Can I do it like this patch which just ignore tgid besarch failures? |
From 42d81bfc739a8e1cf6d3b5a2c55b79e0a766d72d Mon Sep 17 00:00:00 2001 From: panfengyun <panfy.fnst@xxxxxxxxxxxxxx> Date: Sun, 21 Sep 2014 15:46:17 +0800 Subject: [PATCH] ignore tgid bsearch failures it's not worth canceling the whole command with an error(FATAL...) when such an anomoly occurs on a live system. So it should be imply ignored tgid bsearch failures on live systems, and just use the RSS stats stored in the per-tgid mm_struct. --- crash-7.0.8/memory.c | 91 ++++++++++++++++++++++++------------------------- 1 files changed, 45 insertions(+), 46 deletions(-) diff --git a/crash-7.0.8/memory.c b/crash-7.0.8/memory.c index 518c917..c046b10 100755 --- a/crash-7.0.8/memory.c +++ b/crash-7.0.8/memory.c @@ -4187,54 +4187,53 @@ get_task_mem_usage(ulong task, struct task_mem_usage *tm) tg = (struct tgid_context *)bsearch(&tgid, tgid_array, RUNNING_TASKS(), sizeof(struct tgid_context), sort_by_tgid); - if (tg == NULL) - error(FATAL, "bsearch for tgid failed: task: %lx tgid: %ld\n", - task, tgid.tgid); - - /* find the first element which has the same tgid */ - first = tg; - while ((first > tgid_array) && ((first - 1)->tgid == first->tgid)) - first--; - - /* find the last element which have same tgid */ - last = tg; - while ((last < (tgid_array + (RUNNING_TASKS() - 1))) && - (last->tgid == (last + 1)->tgid)) - last++; - - while (first <= last) + if (tg != NULL) { - /* count 0 -> filepages */ - if (!readmem(first->task + - OFFSET(task_struct_rss_stat) + - OFFSET(task_rss_stat_count), KVADDR, - &sync_rss, - sizeof(int), - "task_struct rss_stat MM_FILEPAGES", - RETURN_ON_ERROR)) - continue; + /* find the first element which has the same tgid */ + first = tg; + while ((first > tgid_array) && ((first - 1)->tgid == first->tgid)) + first--; + + /* find the last element which have same tgid */ + last = tg; + while ((last < (tgid_array + (RUNNING_TASKS() - 1))) && + (last->tgid == (last + 1)->tgid)) + last++; + + while (first <= last) + { + /* count 0 -> filepages */ + if (!readmem(first->task + + OFFSET(task_struct_rss_stat) + + OFFSET(task_rss_stat_count), KVADDR, + &sync_rss, + sizeof(int), + "task_struct rss_stat MM_FILEPAGES", + RETURN_ON_ERROR)) + continue; - rss += sync_rss; - - /* count 1 -> anonpages */ - if (!readmem(first->task + - OFFSET(task_struct_rss_stat) + - OFFSET(task_rss_stat_count) + - sizeof(int), - KVADDR, &sync_rss, - sizeof(int), - "task_struct rss_stat MM_ANONPAGES", - RETURN_ON_ERROR)) - continue; + rss += sync_rss; + + /* count 1 -> anonpages */ + if (!readmem(first->task + + OFFSET(task_struct_rss_stat) + + OFFSET(task_rss_stat_count) + + sizeof(int), + KVADDR, &sync_rss, + sizeof(int), + "task_struct rss_stat MM_ANONPAGES", + RETURN_ON_ERROR)) + continue; - rss += sync_rss; + rss += sync_rss; - if(first == last) - break; - first++; - } + if(first == last) + break; + first++; + } - tt->last_tgid = last; + tt->last_tgid = last; + } } /* -- 1.7.1
-- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility