Hi Minfei, Thanks a lot for your information! According to your description and strace log, it seems there is something wrong in initial_for_parallel(). I reviewed the relevant code, but haven't got any idea. And I have one more question. Does it happen every time with the same command? -- Thanks Zhou On 03/16/2016 04:32 PM, Minfei Huang wrote: > >> On Mar 16, 2016, at 16:26, Zhou, Wenjian/??? <zhouwj-fnst at cn.fujitsu.com> wrote: >> >> On 03/16/2016 04:04 PM, Minfei Huang wrote: >>> On 03/16/16 at 09:55am, "Zhou, Wenjian/???" wrote: >>>> Hi Minfei, >>>> >>>> I have some questions. >>>> >>>> If the value of num-threads is 8, >>>> 1. How much is the free memory before running makedumpfile failed? >>> >>> Hmm, this machine is reserved by other, I have no access to take a look >>> about reserved memory. All of the configuration are set by default. >>> Maybe it's about 420M. >>> >> >> I don't mean the reserved memory. >> I mean the free memory. > > Sorry, there is no record about such info. > >> >>>> >>>> 2. How much is the free memory before running makedumpfile success? >>> >>> I don't memtion this during testing it. >>> >>>> >>>> >>>> And the following result is very strange if all cache has been dropped. >>>> makedumpfile --num-threads 30 -d 31 >>>> real 0m0.006s >>>> user 0m0.002s >>>> sys 0m0.004s >>> >>> For this case, makedumpfile fails to dump vmcore with option >>> --num-threads 30. >>> >>> I suspect the following output from strace. >>> >>>>> 1313 mmap(NULL, 18446744048584388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>> 1314 mmap(NULL, 18446744048584523776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>> 1315 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5927bb2000 >> >> I see. >> >> Is there any error messages? >> Such as "out of memory?? > > the allocated memory is too large 18446744048584388608? > >> >> How about it without the patch? > > It works well without this patch from my test. > >> >> Will it occur if double the reserved memory? > > No. I just tested all of the test cases. > >> >> BTW, can it be reproduced in other machines? > > No, I have only one with such large memory. > >> I haven't get such result in my machine yet. >> >> In my machine, the number of free memory will not always the same >> after executing makedumpfile each time. >> So if there is not enough memory, makedumpfile will fail sometimes. >> But I'm not sure whether they are the same issue. >> >> -- >> Thanks >> Zhou >> >>> >>> Thanks >>> Minfei >>> >>>> >>>> -- >>>> Thanks >>>> Zhou >>>> >>>> On 03/15/2016 05:33 PM, Minfei Huang wrote: >>>>> On 03/15/16 at 03:12pm, "Zhou, Wenjian/???" wrote: >>>>>> Hello Minfei, >>>>>> >>>>>> I guess the result is affected by the caches. >>>>>> How about executing the following command before running makedumpfile each time? >>>>>> # echo 3 > /proc/sys/vm/drop_caches >>>>> >>>>> Hi, Zhou. >>>>> >>>>> Seem there is a bug during dumping vmcore with option num-threads. >>>>> >>>>> 1307 open("/proc/meminfo", O_RDONLY) = 4 >>>>> 1308 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 >>>>> 1309 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f59322d3000 >>>>> 1310 read(4, "MemTotal: 385452 kB\nMemF"..., 1024) = 1024 >>>>> 1311 close(4) = 0 >>>>> 1312 munmap(0x7f59322d3000, 4096) = 0 >>>>> 1313 mmap(NULL, 18446744048584388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>> 1314 mmap(NULL, 18446744048584523776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>> 1315 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5927bb2000 >>>>> 1316 munmap(0x7f5927bb2000, 4513792) = 0 >>>>> 1317 munmap(0x7f592c000000, 62595072) = 0 >>>>> 1318 mprotect(0x7f5928000000, 135168, PROT_READ|PROT_WRITE) = 0 >>>>> 1319 mmap(NULL, 18446744048584388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>> >>>>> Thanks >>>>> Minfei >>>>> >>>>>> >>>>>> -- >>>>>> Thanks >>>>>> Zhou >>>>>> >>>>>> On 03/15/2016 02:34 PM, Minfei Huang wrote: >>>>>>> Hi, Zhou. >>>>>>> >>>>>>> I have applied this patch base on 1.5.9. There are several testcases I >>>>>>> have tested. >>>>>>> >>>>>>> - makedumpfile --num-threads 64 -d 31 >>>>>>> real 0m0.010s >>>>>>> user 0m0.002s >>>>>>> sys 0m0.009s >>>>>>> >>>>>>> - makedumpfile --num-threads 31 -d 31 >>>>>>> real 2m40.915s >>>>>>> user 10m50.900s >>>>>>> sys 23m9.664s >>>>>>> >>>>>>> makedumpfile --num-threads 30 -d 31 >>>>>>> real 0m0.006s >>>>>>> user 0m0.002s >>>>>>> sys 0m0.004s >>>>>>> >>>>>>> makedumpfile --num-threads 32 -d 31 >>>>>>> real 0m0.007s >>>>>>> user 0m0.002s >>>>>>> sys 0m0.005s >>>>>>> >>>>>>> - makedumpfile --num-threads 8 -d 31 >>>>>>> real 2m32.692s >>>>>>> user 7m4.630s >>>>>>> sys 2m0.369s >>>>>>> >>>>>>> - makedumpfile --num-threads 1 -d 31 >>>>>>> real 4m42.423s >>>>>>> user 7m27.153s >>>>>>> sys 0m22.490s >>>>>>> >>>>>>> - makedumpfile.orig -d 31 >>>>>>> real 4m1.297s >>>>>>> user 3m39.696s >>>>>>> sys 0m15.200s >>>>>>> >>>>>>> This patch has a huge increment to the filter performance under 31. But >>>>>>> it is not stable, since makedumpfile fails to dump vmcore intermittently. >>>>>>> You can find the above test result, makedumpfile fails to dump vmcore >>>>>>> with option --num-threads 64, also it may occur with option >>>>>>> --number-threads 8. >>>>>>> >>>>>>> Thanks >>>>>>> Minfei >>>>>>> >>>>>>> On 03/09/16 at 08:27am, Zhou Wenjian wrote: >>>>>>>> v4: >>>>>>>> 1. fix a bug caused by the logic >>>>>>>> v3: >>>>>>>> 1. remove some unused variables >>>>>>>> 2. fix a bug caused by the wrong logic >>>>>>>> 3. fix a bug caused by optimising >>>>>>>> 4. improve more performance by using Minoru Usui's code >>>>>>>> >>>>>>>> multi-threads implementation will introduce extra cost when handling >>>>>>>> each page. The origin implementation will also do the extra work for >>>>>>>> filtered pages. So there is a big performance degradation in >>>>>>>> --num-threads -d 31. >>>>>>>> The new implementation won't do the extra work for filtered pages any >>>>>>>> more. So the performance of -d 31 is close to that of serial processing. >>>>>>>> >>>>>>>> The new implementation is just like the following: >>>>>>>> * The basic idea is producer producing page and consumer writing page. >>>>>>>> * Each producer have a page_flag_buf list which is used for storing >>>>>>>> page's description. >>>>>>>> * The size of page_flag_buf is little so it won't take too much memory. >>>>>>>> * And all producers will share a page_data_buf array which is >>>>>>>> used for storing page's compressed data. >>>>>>>> * The main thread is the consumer. It will find the next pfn and write >>>>>>>> it into file. >>>>>>>> * The next pfn is smallest pfn in all page_flag_buf. >>>>>>>> >>>>>>>> Signed-off-by: Minoru Usui <min-usui at ti.jp.nec.com> >>>>>>>> Signed-off-by: Zhou Wenjian <zhouwj-fnst at cn.fujitsu.com> >>>>>>>> --- >>>>>>>> makedumpfile.c | 298 +++++++++++++++++++++++++++++++++++---------------------- >>>>>>>> makedumpfile.h | 35 ++++--- >>>>>>>> 2 files changed, 202 insertions(+), 131 deletions(-) >>>>>>>> >>>>>>>> diff --git a/makedumpfile.c b/makedumpfile.c >>>>>>>> index fa0b779..2b0864a 100644 >>>>>>>> --- a/makedumpfile.c >>>>>>>> +++ b/makedumpfile.c >>>>>>>> @@ -3483,7 +3483,8 @@ initial_for_parallel() >>>>>>>> unsigned long page_data_buf_size; >>>>>>>> unsigned long limit_size; >>>>>>>> int page_data_num; >>>>>>>> - int i; >>>>>>>> + struct page_flag *current; >>>>>>>> + int i, j; >>>>>>>> >>>>>>>> len_buf_out = calculate_len_buf_out(info->page_size); >>>>>>>> >>>>>>>> @@ -3560,10 +3561,16 @@ initial_for_parallel() >>>>>>>> >>>>>>>> limit_size = (get_free_memory_size() >>>>>>>> - MAP_REGION * info->num_threads) * 0.6; >>>>>>>> + if (limit_size < 0) { >>>>>>>> + MSG("Free memory is not enough for multi-threads\n"); >>>>>>>> + return FALSE; >>>>>>>> + } >>>>>>>> >>>>>>>> page_data_num = limit_size / page_data_buf_size; >>>>>>>> + info->num_buffers = 3 * info->num_threads; >>>>>>>> >>>>>>>> - info->num_buffers = MIN(NUM_BUFFERS, page_data_num); >>>>>>>> + info->num_buffers = MAX(info->num_buffers, NUM_BUFFERS); >>>>>>>> + info->num_buffers = MIN(info->num_buffers, page_data_num); >>>>>>>> >>>>>>>> DEBUG_MSG("Number of struct page_data for produce/consume: %d\n", >>>>>>>> info->num_buffers); >>>>>>>> @@ -3588,6 +3595,36 @@ initial_for_parallel() >>>>>>>> } >>>>>>>> >>>>>>>> /* >>>>>>>> + * initial page_flag for each thread >>>>>>>> + */ >>>>>>>> + if ((info->page_flag_buf = malloc(sizeof(void *) * info->num_threads)) >>>>>>>> + == NULL) { >>>>>>>> + MSG("Can't allocate memory for page_flag_buf. %s\n", >>>>>>>> + strerror(errno)); >>>>>>>> + return FALSE; >>>>>>>> + } >>>>>>>> + memset(info->page_flag_buf, 0, sizeof(void *) * info->num_threads); >>>>>>>> + >>>>>>>> + for (i = 0; i < info->num_threads; i++) { >>>>>>>> + if ((info->page_flag_buf[i] = calloc(1, sizeof(struct page_flag))) == NULL) { >>>>>>>> + MSG("Can't allocate memory for page_flag. %s\n", >>>>>>>> + strerror(errno)); >>>>>>>> + return FALSE; >>>>>>>> + } >>>>>>>> + current = info->page_flag_buf[i]; >>>>>>>> + >>>>>>>> + for (j = 1; j < NUM_BUFFERS; j++) { >>>>>>>> + if ((current->next = calloc(1, sizeof(struct page_flag))) == NULL) { >>>>>>>> + MSG("Can't allocate memory for page_flag. %s\n", >>>>>>>> + strerror(errno)); >>>>>>>> + return FALSE; >>>>>>>> + } >>>>>>>> + current = current->next; >>>>>>>> + } >>>>>>>> + current->next = info->page_flag_buf[i]; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* >>>>>>>> * initial fd_memory for threads >>>>>>>> */ >>>>>>>> for (i = 0; i < info->num_threads; i++) { >>>>>>>> @@ -3612,7 +3649,8 @@ initial_for_parallel() >>>>>>>> void >>>>>>>> free_for_parallel() >>>>>>>> { >>>>>>>> - int i; >>>>>>>> + int i, j; >>>>>>>> + struct page_flag *current; >>>>>>>> >>>>>>>> if (info->threads != NULL) { >>>>>>>> for (i = 0; i < info->num_threads; i++) { >>>>>>>> @@ -3655,6 +3693,19 @@ free_for_parallel() >>>>>>>> free(info->page_data_buf); >>>>>>>> } >>>>>>>> >>>>>>>> + if (info->page_flag_buf != NULL) { >>>>>>>> + for (i = 0; i < info->num_threads; i++) { >>>>>>>> + for (j = 0; j < NUM_BUFFERS; j++) { >>>>>>>> + if (info->page_flag_buf[i] != NULL) { >>>>>>>> + current = info->page_flag_buf[i]; >>>>>>>> + info->page_flag_buf[i] = current->next; >>>>>>>> + free(current); >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>>> + free(info->page_flag_buf); >>>>>>>> + } >>>>>>>> + >>>>>>>> if (info->parallel_info == NULL) >>>>>>>> return; >>>>>>>> >>>>>>>> @@ -7075,11 +7126,11 @@ void * >>>>>>>> kdump_thread_function_cyclic(void *arg) { >>>>>>>> void *retval = PTHREAD_FAIL; >>>>>>>> struct thread_args *kdump_thread_args = (struct thread_args *)arg; >>>>>>>> - struct page_data *page_data_buf = kdump_thread_args->page_data_buf; >>>>>>>> + volatile struct page_data *page_data_buf = kdump_thread_args->page_data_buf; >>>>>>>> + volatile struct page_flag *page_flag_buf = kdump_thread_args->page_flag_buf; >>>>>>>> struct cycle *cycle = kdump_thread_args->cycle; >>>>>>>> - int page_data_num = kdump_thread_args->page_data_num; >>>>>>>> - mdf_pfn_t pfn; >>>>>>>> - int index; >>>>>>>> + mdf_pfn_t pfn = cycle->start_pfn; >>>>>>>> + int index = kdump_thread_args->thread_num; >>>>>>>> int buf_ready; >>>>>>>> int dumpable; >>>>>>>> int fd_memory = 0; >>>>>>>> @@ -7125,47 +7176,48 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>> kdump_thread_args->thread_num); >>>>>>>> } >>>>>>>> >>>>>>>> - while (1) { >>>>>>>> - /* get next pfn */ >>>>>>>> - pthread_mutex_lock(&info->current_pfn_mutex); >>>>>>>> - pfn = info->current_pfn; >>>>>>>> - info->current_pfn++; >>>>>>>> - pthread_mutex_unlock(&info->current_pfn_mutex); >>>>>>>> - >>>>>>>> - if (pfn >= kdump_thread_args->end_pfn) >>>>>>>> - break; >>>>>>>> - >>>>>>>> - index = -1; >>>>>>>> + /* >>>>>>>> + * filtered page won't take anything >>>>>>>> + * unfiltered zero page will only take a page_flag_buf >>>>>>>> + * unfiltered non-zero page will take a page_flag_buf and a page_data_buf >>>>>>>> + */ >>>>>>>> + while (pfn < cycle->end_pfn) { >>>>>>>> buf_ready = FALSE; >>>>>>>> >>>>>>>> + pthread_mutex_lock(&info->page_data_mutex); >>>>>>>> + while (page_data_buf[index].used != FALSE) { >>>>>>>> + index = (index + 1) % info->num_buffers; >>>>>>>> + } >>>>>>>> + page_data_buf[index].used = TRUE; >>>>>>>> + pthread_mutex_unlock(&info->page_data_mutex); >>>>>>>> + >>>>>>>> while (buf_ready == FALSE) { >>>>>>>> pthread_testcancel(); >>>>>>>> - >>>>>>>> - index = pfn % page_data_num; >>>>>>>> - >>>>>>>> - if (pfn - info->consumed_pfn > info->num_buffers) >>>>>>>> + if (page_flag_buf->ready == FLAG_READY) >>>>>>>> continue; >>>>>>>> >>>>>>>> - if (page_data_buf[index].ready != 0) >>>>>>>> - continue; >>>>>>>> - >>>>>>>> - pthread_mutex_lock(&page_data_buf[index].mutex); >>>>>>>> - >>>>>>>> - if (page_data_buf[index].ready != 0) >>>>>>>> - goto unlock; >>>>>>>> - >>>>>>>> - buf_ready = TRUE; >>>>>>>> + /* get next dumpable pfn */ >>>>>>>> + pthread_mutex_lock(&info->current_pfn_mutex); >>>>>>>> + for (pfn = info->current_pfn; pfn < cycle->end_pfn; pfn++) { >>>>>>>> + dumpable = is_dumpable( >>>>>>>> + info->fd_bitmap ? &bitmap_parallel : info->bitmap2, >>>>>>>> + pfn, >>>>>>>> + cycle); >>>>>>>> + if (dumpable) >>>>>>>> + break; >>>>>>>> + } >>>>>>>> + info->current_pfn = pfn + 1; >>>>>>>> >>>>>>>> - page_data_buf[index].pfn = pfn; >>>>>>>> - page_data_buf[index].ready = 1; >>>>>>>> + page_flag_buf->pfn = pfn; >>>>>>>> + page_flag_buf->ready = FLAG_FILLING; >>>>>>>> + pthread_mutex_unlock(&info->current_pfn_mutex); >>>>>>>> + sem_post(&info->page_flag_buf_sem); >>>>>>>> >>>>>>>> - dumpable = is_dumpable( >>>>>>>> - info->fd_bitmap ? &bitmap_parallel : info->bitmap2, >>>>>>>> - pfn, >>>>>>>> - cycle); >>>>>>>> - page_data_buf[index].dumpable = dumpable; >>>>>>>> - if (!dumpable) >>>>>>>> - goto unlock; >>>>>>>> + if (pfn >= cycle->end_pfn) { >>>>>>>> + info->current_pfn = cycle->end_pfn; >>>>>>>> + page_data_buf[index].used = FALSE; >>>>>>>> + break; >>>>>>>> + } >>>>>>>> >>>>>>>> if (!read_pfn_parallel(fd_memory, pfn, buf, >>>>>>>> &bitmap_memory_parallel, >>>>>>>> @@ -7178,11 +7230,11 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>> >>>>>>>> if ((info->dump_level & DL_EXCLUDE_ZERO) >>>>>>>> && is_zero_page(buf, info->page_size)) { >>>>>>>> - page_data_buf[index].zero = TRUE; >>>>>>>> - goto unlock; >>>>>>>> + page_flag_buf->zero = TRUE; >>>>>>>> + goto next; >>>>>>>> } >>>>>>>> >>>>>>>> - page_data_buf[index].zero = FALSE; >>>>>>>> + page_flag_buf->zero = FALSE; >>>>>>>> >>>>>>>> /* >>>>>>>> * Compress the page data. >>>>>>>> @@ -7210,6 +7262,7 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>> page_data_buf[index].flags = >>>>>>>> DUMP_DH_COMPRESSED_LZO; >>>>>>>> page_data_buf[index].size = size_out; >>>>>>>> + >>>>>>>> memcpy(page_data_buf[index].buf, buf_out, size_out); >>>>>>>> #endif >>>>>>>> #ifdef USESNAPPY >>>>>>>> @@ -7232,12 +7285,14 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>> page_data_buf[index].size = info->page_size; >>>>>>>> memcpy(page_data_buf[index].buf, buf, info->page_size); >>>>>>>> } >>>>>>>> -unlock: >>>>>>>> - pthread_mutex_unlock(&page_data_buf[index].mutex); >>>>>>>> + page_flag_buf->index = index; >>>>>>>> + buf_ready = TRUE; >>>>>>>> +next: >>>>>>>> + page_flag_buf->ready = FLAG_READY; >>>>>>>> + page_flag_buf = page_flag_buf->next; >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> - >>>>>>>> retval = NULL; >>>>>>>> >>>>>>>> fail: >>>>>>>> @@ -7265,14 +7320,15 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>> struct page_desc pd; >>>>>>>> struct timeval tv_start; >>>>>>>> struct timeval last, new; >>>>>>>> - unsigned long long consuming_pfn; >>>>>>>> pthread_t **threads = NULL; >>>>>>>> struct thread_args *kdump_thread_args = NULL; >>>>>>>> void *thread_result; >>>>>>>> - int page_data_num; >>>>>>>> + int page_buf_num; >>>>>>>> struct page_data *page_data_buf = NULL; >>>>>>>> int i; >>>>>>>> int index; >>>>>>>> + int end_count, consuming, check_count; >>>>>>>> + mdf_pfn_t current_pfn, temp_pfn; >>>>>>>> >>>>>>>> if (info->flag_elf_dumpfile) >>>>>>>> return FALSE; >>>>>>>> @@ -7284,13 +7340,6 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>> goto out; >>>>>>>> } >>>>>>>> >>>>>>>> - res = pthread_mutex_init(&info->consumed_pfn_mutex, NULL); >>>>>>>> - if (res != 0) { >>>>>>>> - ERRMSG("Can't initialize consumed_pfn_mutex. %s\n", >>>>>>>> - strerror(res)); >>>>>>>> - goto out; >>>>>>>> - } >>>>>>>> - >>>>>>>> res = pthread_mutex_init(&info->filter_mutex, NULL); >>>>>>>> if (res != 0) { >>>>>>>> ERRMSG("Can't initialize filter_mutex. %s\n", strerror(res)); >>>>>>>> @@ -7314,36 +7363,23 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>> end_pfn = cycle->end_pfn; >>>>>>>> >>>>>>>> info->current_pfn = start_pfn; >>>>>>>> - info->consumed_pfn = start_pfn - 1; >>>>>>>> >>>>>>>> threads = info->threads; >>>>>>>> kdump_thread_args = info->kdump_thread_args; >>>>>>>> >>>>>>>> - page_data_num = info->num_buffers; >>>>>>>> + page_buf_num = info->num_buffers; >>>>>>>> page_data_buf = info->page_data_buf; >>>>>>>> + pthread_mutex_init(&info->page_data_mutex, NULL); >>>>>>>> + sem_init(&info->page_flag_buf_sem, 0, 0); >>>>>>>> >>>>>>>> - for (i = 0; i < page_data_num; i++) { >>>>>>>> - /* >>>>>>>> - * producer will use pfn in page_data_buf to decide the >>>>>>>> - * consumed pfn >>>>>>>> - */ >>>>>>>> - page_data_buf[i].pfn = start_pfn - 1; >>>>>>>> - page_data_buf[i].ready = 0; >>>>>>>> - res = pthread_mutex_init(&page_data_buf[i].mutex, NULL); >>>>>>>> - if (res != 0) { >>>>>>>> - ERRMSG("Can't initialize mutex of page_data_buf. %s\n", >>>>>>>> - strerror(res)); >>>>>>>> - goto out; >>>>>>>> - } >>>>>>>> - } >>>>>>>> + for (i = 0; i < page_buf_num; i++) >>>>>>>> + page_data_buf[i].used = FALSE; >>>>>>>> >>>>>>>> for (i = 0; i < info->num_threads; i++) { >>>>>>>> kdump_thread_args[i].thread_num = i; >>>>>>>> kdump_thread_args[i].len_buf_out = len_buf_out; >>>>>>>> - kdump_thread_args[i].start_pfn = start_pfn; >>>>>>>> - kdump_thread_args[i].end_pfn = end_pfn; >>>>>>>> - kdump_thread_args[i].page_data_num = page_data_num; >>>>>>>> kdump_thread_args[i].page_data_buf = page_data_buf; >>>>>>>> + kdump_thread_args[i].page_flag_buf = info->page_flag_buf[i]; >>>>>>>> kdump_thread_args[i].cycle = cycle; >>>>>>>> >>>>>>>> res = pthread_create(threads[i], NULL, >>>>>>>> @@ -7356,55 +7392,88 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> - consuming_pfn = start_pfn; >>>>>>>> - index = -1; >>>>>>>> + end_count = 0; >>>>>>>> + while (1) { >>>>>>>> + consuming = 0; >>>>>>>> + check_count = 0; >>>>>>>> >>>>>>>> - gettimeofday(&last, NULL); >>>>>>>> + /* >>>>>>>> + * The basic idea is producer producing page and consumer writing page. >>>>>>>> + * Each producer have a page_flag_buf list which is used for storing page's description. >>>>>>>> + * The size of page_flag_buf is little so it won't take too much memory. >>>>>>>> + * And all producers will share a page_data_buf array which is used for storing page's compressed data. >>>>>>>> + * The main thread is the consumer. It will find the next pfn and write it into file. >>>>>>>> + * The next pfn is smallest pfn in all page_flag_buf. >>>>>>>> + */ >>>>>>>> + sem_wait(&info->page_flag_buf_sem); >>>>>>>> + gettimeofday(&last, NULL); >>>>>>>> + while (1) { >>>>>>>> + current_pfn = end_pfn; >>>>>>>> >>>>>>>> - while (consuming_pfn < end_pfn) { >>>>>>>> - index = consuming_pfn % page_data_num; >>>>>>>> + /* >>>>>>>> + * page_flag_buf is in circular linked list. >>>>>>>> + * The array info->page_flag_buf[] records the current page_flag_buf in each thread's >>>>>>>> + * page_flag_buf list. >>>>>>>> + * consuming is used for recording in which thread the pfn is the smallest. >>>>>>>> + * current_pfn is used for recording the value of pfn when checking the pfn. >>>>>>>> + */ >>>>>>>> + for (i = 0; i < info->num_threads; i++) { >>>>>>>> + if (info->page_flag_buf[i]->ready == FLAG_UNUSED) >>>>>>>> + continue; >>>>>>>> + temp_pfn = info->page_flag_buf[i]->pfn; >>>>>>>> >>>>>>>> - gettimeofday(&new, NULL); >>>>>>>> - if (new.tv_sec - last.tv_sec > WAIT_TIME) { >>>>>>>> - ERRMSG("Can't get data of pfn %llx.\n", consuming_pfn); >>>>>>>> - goto out; >>>>>>>> - } >>>>>>>> + /* >>>>>>>> + * count how many threads have reached the end. >>>>>>>> + */ >>>>>>>> + if (temp_pfn >= end_pfn) { >>>>>>>> + info->page_flag_buf[i]->ready = FLAG_UNUSED; >>>>>>>> + end_count++; >>>>>>>> + continue; >>>>>>>> + } >>>>>>>> >>>>>>>> - /* >>>>>>>> - * check pfn first without mutex locked to reduce the time >>>>>>>> - * trying to lock the mutex >>>>>>>> - */ >>>>>>>> - if (page_data_buf[index].pfn != consuming_pfn) >>>>>>>> - continue; >>>>>>>> + if (current_pfn < temp_pfn) >>>>>>>> + continue; >>>>>>>> >>>>>>>> - if (pthread_mutex_trylock(&page_data_buf[index].mutex) != 0) >>>>>>>> - continue; >>>>>>>> + check_count++; >>>>>>>> + consuming = i; >>>>>>>> + current_pfn = temp_pfn; >>>>>>>> + } >>>>>>>> >>>>>>>> - /* check whether the found one is ready to be consumed */ >>>>>>>> - if (page_data_buf[index].pfn != consuming_pfn || >>>>>>>> - page_data_buf[index].ready != 1) { >>>>>>>> - goto unlock; >>>>>>>> + /* >>>>>>>> + * If all the threads have reached the end, we will finish writing. >>>>>>>> + */ >>>>>>>> + if (end_count >= info->num_threads) >>>>>>>> + goto finish; >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * If the page_flag_buf is not ready, the pfn recorded may be changed. >>>>>>>> + * So we should recheck. >>>>>>>> + */ >>>>>>>> + if (info->page_flag_buf[consuming]->ready != FLAG_READY) { >>>>>>>> + gettimeofday(&new, NULL); >>>>>>>> + if (new.tv_sec - last.tv_sec > WAIT_TIME) { >>>>>>>> + ERRMSG("Can't get data of pfn.\n"); >>>>>>>> + goto out; >>>>>>>> + } >>>>>>>> + continue; >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (current_pfn == info->page_flag_buf[consuming]->pfn) >>>>>>>> + break; >>>>>>>> } >>>>>>>> >>>>>>>> if ((num_dumped % per) == 0) >>>>>>>> print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable); >>>>>>>> >>>>>>>> - /* next pfn is found, refresh last here */ >>>>>>>> - last = new; >>>>>>>> - consuming_pfn++; >>>>>>>> - info->consumed_pfn++; >>>>>>>> - page_data_buf[index].ready = 0; >>>>>>>> - >>>>>>>> - if (page_data_buf[index].dumpable == FALSE) >>>>>>>> - goto unlock; >>>>>>>> - >>>>>>>> num_dumped++; >>>>>>>> >>>>>>>> - if (page_data_buf[index].zero == TRUE) { >>>>>>>> + >>>>>>>> + if (info->page_flag_buf[consuming]->zero == TRUE) { >>>>>>>> if (!write_cache(cd_header, pd_zero, sizeof(page_desc_t))) >>>>>>>> goto out; >>>>>>>> pfn_zero++; >>>>>>>> } else { >>>>>>>> + index = info->page_flag_buf[consuming]->index; >>>>>>>> pd.flags = page_data_buf[index].flags; >>>>>>>> pd.size = page_data_buf[index].size; >>>>>>>> pd.page_flags = 0; >>>>>>>> @@ -7420,12 +7489,12 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>> */ >>>>>>>> if (!write_cache(cd_page, page_data_buf[index].buf, pd.size)) >>>>>>>> goto out; >>>>>>>> - >>>>>>>> + page_data_buf[index].used = FALSE; >>>>>>>> } >>>>>>>> -unlock: >>>>>>>> - pthread_mutex_unlock(&page_data_buf[index].mutex); >>>>>>>> + info->page_flag_buf[consuming]->ready = FLAG_UNUSED; >>>>>>>> + info->page_flag_buf[consuming] = info->page_flag_buf[consuming]->next; >>>>>>>> } >>>>>>>> - >>>>>>>> +finish: >>>>>>>> ret = TRUE; >>>>>>>> /* >>>>>>>> * print [100 %] >>>>>>>> @@ -7463,15 +7532,9 @@ out: >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> - if (page_data_buf != NULL) { >>>>>>>> - for (i = 0; i < page_data_num; i++) { >>>>>>>> - pthread_mutex_destroy(&page_data_buf[i].mutex); >>>>>>>> - } >>>>>>>> - } >>>>>>>> - >>>>>>>> + sem_destroy(&info->page_flag_buf_sem); >>>>>>>> pthread_rwlock_destroy(&info->usemmap_rwlock); >>>>>>>> pthread_mutex_destroy(&info->filter_mutex); >>>>>>>> - pthread_mutex_destroy(&info->consumed_pfn_mutex); >>>>>>>> pthread_mutex_destroy(&info->current_pfn_mutex); >>>>>>>> >>>>>>>> return ret; >>>>>>>> @@ -7564,6 +7627,7 @@ write_kdump_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_pag >>>>>>>> num_dumped++; >>>>>>>> if (!read_pfn(pfn, buf)) >>>>>>>> goto out; >>>>>>>> + >>>>>>>> filter_data_buffer(buf, pfn_to_paddr(pfn), info->page_size); >>>>>>>> >>>>>>>> /* >>>>>>>> diff --git a/makedumpfile.h b/makedumpfile.h >>>>>>>> index e0b5bbf..4b315c0 100644 >>>>>>>> --- a/makedumpfile.h >>>>>>>> +++ b/makedumpfile.h >>>>>>>> @@ -44,6 +44,7 @@ >>>>>>>> #include "print_info.h" >>>>>>>> #include "sadump_mod.h" >>>>>>>> #include <pthread.h> >>>>>>>> +#include <semaphore.h> >>>>>>>> >>>>>>>> /* >>>>>>>> * Result of command >>>>>>>> @@ -977,7 +978,7 @@ typedef unsigned long long int ulonglong; >>>>>>>> #define PAGE_DATA_NUM (50) >>>>>>>> #define WAIT_TIME (60 * 10) >>>>>>>> #define PTHREAD_FAIL ((void *)-2) >>>>>>>> -#define NUM_BUFFERS (50) >>>>>>>> +#define NUM_BUFFERS (20) >>>>>>>> >>>>>>>> struct mmap_cache { >>>>>>>> char *mmap_buf; >>>>>>>> @@ -985,28 +986,33 @@ struct mmap_cache { >>>>>>>> off_t mmap_end_offset; >>>>>>>> }; >>>>>>>> >>>>>>>> +enum { >>>>>>>> + FLAG_UNUSED, >>>>>>>> + FLAG_READY, >>>>>>>> + FLAG_FILLING >>>>>>>> +}; >>>>>>>> +struct page_flag { >>>>>>>> + mdf_pfn_t pfn; >>>>>>>> + char zero; >>>>>>>> + char ready; >>>>>>>> + short index; >>>>>>>> + struct page_flag *next; >>>>>>>> +}; >>>>>>>> + >>>>>>>> struct page_data >>>>>>>> { >>>>>>>> - mdf_pfn_t pfn; >>>>>>>> - int dumpable; >>>>>>>> - int zero; >>>>>>>> - unsigned int flags; >>>>>>>> long size; >>>>>>>> unsigned char *buf; >>>>>>>> - pthread_mutex_t mutex; >>>>>>>> - /* >>>>>>>> - * whether the page_data is ready to be consumed >>>>>>>> - */ >>>>>>>> - int ready; >>>>>>>> + int flags; >>>>>>>> + int used; >>>>>>>> }; >>>>>>>> >>>>>>>> struct thread_args { >>>>>>>> int thread_num; >>>>>>>> unsigned long len_buf_out; >>>>>>>> - mdf_pfn_t start_pfn, end_pfn; >>>>>>>> - int page_data_num; >>>>>>>> struct cycle *cycle; >>>>>>>> struct page_data *page_data_buf; >>>>>>>> + struct page_flag *page_flag_buf; >>>>>>>> }; >>>>>>>> >>>>>>>> /* >>>>>>>> @@ -1295,11 +1301,12 @@ struct DumpInfo { >>>>>>>> pthread_t **threads; >>>>>>>> struct thread_args *kdump_thread_args; >>>>>>>> struct page_data *page_data_buf; >>>>>>>> + struct page_flag **page_flag_buf; >>>>>>>> + sem_t page_flag_buf_sem; >>>>>>>> pthread_rwlock_t usemmap_rwlock; >>>>>>>> mdf_pfn_t current_pfn; >>>>>>>> pthread_mutex_t current_pfn_mutex; >>>>>>>> - mdf_pfn_t consumed_pfn; >>>>>>>> - pthread_mutex_t consumed_pfn_mutex; >>>>>>>> + pthread_mutex_t page_data_mutex; >>>>>>>> pthread_mutex_t filter_mutex; >>>>>>>> }; >>>>>>>> extern struct DumpInfo *info; >>>>>>>> -- >>>>>>>> 1.8.3.1 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> kexec mailing list >>>>>>>> kexec at lists.infradead.org >>>>>>>> http://lists.infradead.org/mailman/listinfo/kexec >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> kexec mailing list >>>>>> kexec at lists.infradead.org >>>>>> http://lists.infradead.org/mailman/listinfo/kexec