> On Mar 16, 2016, at 16:59, Zhou, Wenjian/??? <zhouwj-fnst at cn.fujitsu.com> wrote: > > Hi Minfei, > > Thanks a lot for your information! > > According to your description and strace log, > it seems there is something wrong in initial_for_parallel(). > > I reviewed the relevant code, but haven't got any idea. > And I have one more question. > Does it happen every time with the same command? Yes. it always fail with option ?num-threads 64. Thanks Minfei > > -- > Thanks > Zhou > > On 03/16/2016 04:32 PM, Minfei Huang wrote: >> >>> On Mar 16, 2016, at 16:26, Zhou, Wenjian/??? <zhouwj-fnst at cn.fujitsu.com> wrote: >>> >>> On 03/16/2016 04:04 PM, Minfei Huang wrote: >>>> On 03/16/16 at 09:55am, "Zhou, Wenjian/???" wrote: >>>>> Hi Minfei, >>>>> >>>>> I have some questions. >>>>> >>>>> If the value of num-threads is 8, >>>>> 1. How much is the free memory before running makedumpfile failed? >>>> >>>> Hmm, this machine is reserved by other, I have no access to take a look >>>> about reserved memory. All of the configuration are set by default. >>>> Maybe it's about 420M. >>>> >>> >>> I don't mean the reserved memory. >>> I mean the free memory. >> >> Sorry, there is no record about such info. >> >>> >>>>> >>>>> 2. How much is the free memory before running makedumpfile success? >>>> >>>> I don't memtion this during testing it. >>>> >>>>> >>>>> >>>>> And the following result is very strange if all cache has been dropped. >>>>> makedumpfile --num-threads 30 -d 31 >>>>> real 0m0.006s >>>>> user 0m0.002s >>>>> sys 0m0.004s >>>> >>>> For this case, makedumpfile fails to dump vmcore with option >>>> --num-threads 30. >>>> >>>> I suspect the following output from strace. >>>> >>>>>> 1313 mmap(NULL, 18446744048584388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>>> 1314 mmap(NULL, 18446744048584523776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>>> 1315 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5927bb2000 >>> >>> I see. >>> >>> Is there any error messages? >>> Such as "out of memory?? >> >> the allocated memory is too large 18446744048584388608? >> >>> >>> How about it without the patch? >> >> It works well without this patch from my test. >> >>> >>> Will it occur if double the reserved memory? >> >> No. I just tested all of the test cases. >> >>> >>> BTW, can it be reproduced in other machines? >> >> No, I have only one with such large memory. >> >>> I haven't get such result in my machine yet. >>> >>> In my machine, the number of free memory will not always the same >>> after executing makedumpfile each time. >>> So if there is not enough memory, makedumpfile will fail sometimes. >>> But I'm not sure whether they are the same issue. >>> >>> -- >>> Thanks >>> Zhou >>> >>>> >>>> Thanks >>>> Minfei >>>> >>>>> >>>>> -- >>>>> Thanks >>>>> Zhou >>>>> >>>>> On 03/15/2016 05:33 PM, Minfei Huang wrote: >>>>>> On 03/15/16 at 03:12pm, "Zhou, Wenjian/???" wrote: >>>>>>> Hello Minfei, >>>>>>> >>>>>>> I guess the result is affected by the caches. >>>>>>> How about executing the following command before running makedumpfile each time? >>>>>>> # echo 3 > /proc/sys/vm/drop_caches >>>>>> >>>>>> Hi, Zhou. >>>>>> >>>>>> Seem there is a bug during dumping vmcore with option num-threads. >>>>>> >>>>>> 1307 open("/proc/meminfo", O_RDONLY) = 4 >>>>>> 1308 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 >>>>>> 1309 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f59322d3000 >>>>>> 1310 read(4, "MemTotal: 385452 kB\nMemF"..., 1024) = 1024 >>>>>> 1311 close(4) = 0 >>>>>> 1312 munmap(0x7f59322d3000, 4096) = 0 >>>>>> 1313 mmap(NULL, 18446744048584388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>>> 1314 mmap(NULL, 18446744048584523776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>>> 1315 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f5927bb2000 >>>>>> 1316 munmap(0x7f5927bb2000, 4513792) = 0 >>>>>> 1317 munmap(0x7f592c000000, 62595072) = 0 >>>>>> 1318 mprotect(0x7f5928000000, 135168, PROT_READ|PROT_WRITE) = 0 >>>>>> 1319 mmap(NULL, 18446744048584388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) >>>>>> >>>>>> Thanks >>>>>> Minfei >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks >>>>>>> Zhou >>>>>>> >>>>>>> On 03/15/2016 02:34 PM, Minfei Huang wrote: >>>>>>>> Hi, Zhou. >>>>>>>> >>>>>>>> I have applied this patch base on 1.5.9. There are several testcases I >>>>>>>> have tested. >>>>>>>> >>>>>>>> - makedumpfile --num-threads 64 -d 31 >>>>>>>> real 0m0.010s >>>>>>>> user 0m0.002s >>>>>>>> sys 0m0.009s >>>>>>>> >>>>>>>> - makedumpfile --num-threads 31 -d 31 >>>>>>>> real 2m40.915s >>>>>>>> user 10m50.900s >>>>>>>> sys 23m9.664s >>>>>>>> >>>>>>>> makedumpfile --num-threads 30 -d 31 >>>>>>>> real 0m0.006s >>>>>>>> user 0m0.002s >>>>>>>> sys 0m0.004s >>>>>>>> >>>>>>>> makedumpfile --num-threads 32 -d 31 >>>>>>>> real 0m0.007s >>>>>>>> user 0m0.002s >>>>>>>> sys 0m0.005s >>>>>>>> >>>>>>>> - makedumpfile --num-threads 8 -d 31 >>>>>>>> real 2m32.692s >>>>>>>> user 7m4.630s >>>>>>>> sys 2m0.369s >>>>>>>> >>>>>>>> - makedumpfile --num-threads 1 -d 31 >>>>>>>> real 4m42.423s >>>>>>>> user 7m27.153s >>>>>>>> sys 0m22.490s >>>>>>>> >>>>>>>> - makedumpfile.orig -d 31 >>>>>>>> real 4m1.297s >>>>>>>> user 3m39.696s >>>>>>>> sys 0m15.200s >>>>>>>> >>>>>>>> This patch has a huge increment to the filter performance under 31. But >>>>>>>> it is not stable, since makedumpfile fails to dump vmcore intermittently. >>>>>>>> You can find the above test result, makedumpfile fails to dump vmcore >>>>>>>> with option --num-threads 64, also it may occur with option >>>>>>>> --number-threads 8. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Minfei >>>>>>>> >>>>>>>> On 03/09/16 at 08:27am, Zhou Wenjian wrote: >>>>>>>>> v4: >>>>>>>>> 1. fix a bug caused by the logic >>>>>>>>> v3: >>>>>>>>> 1. remove some unused variables >>>>>>>>> 2. fix a bug caused by the wrong logic >>>>>>>>> 3. fix a bug caused by optimising >>>>>>>>> 4. improve more performance by using Minoru Usui's code >>>>>>>>> >>>>>>>>> multi-threads implementation will introduce extra cost when handling >>>>>>>>> each page. The origin implementation will also do the extra work for >>>>>>>>> filtered pages. So there is a big performance degradation in >>>>>>>>> --num-threads -d 31. >>>>>>>>> The new implementation won't do the extra work for filtered pages any >>>>>>>>> more. So the performance of -d 31 is close to that of serial processing. >>>>>>>>> >>>>>>>>> The new implementation is just like the following: >>>>>>>>> * The basic idea is producer producing page and consumer writing page. >>>>>>>>> * Each producer have a page_flag_buf list which is used for storing >>>>>>>>> page's description. >>>>>>>>> * The size of page_flag_buf is little so it won't take too much memory. >>>>>>>>> * And all producers will share a page_data_buf array which is >>>>>>>>> used for storing page's compressed data. >>>>>>>>> * The main thread is the consumer. It will find the next pfn and write >>>>>>>>> it into file. >>>>>>>>> * The next pfn is smallest pfn in all page_flag_buf. >>>>>>>>> >>>>>>>>> Signed-off-by: Minoru Usui <min-usui at ti.jp.nec.com> >>>>>>>>> Signed-off-by: Zhou Wenjian <zhouwj-fnst at cn.fujitsu.com> >>>>>>>>> --- >>>>>>>>> makedumpfile.c | 298 +++++++++++++++++++++++++++++++++++---------------------- >>>>>>>>> makedumpfile.h | 35 ++++--- >>>>>>>>> 2 files changed, 202 insertions(+), 131 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/makedumpfile.c b/makedumpfile.c >>>>>>>>> index fa0b779..2b0864a 100644 >>>>>>>>> --- a/makedumpfile.c >>>>>>>>> +++ b/makedumpfile.c >>>>>>>>> @@ -3483,7 +3483,8 @@ initial_for_parallel() >>>>>>>>> unsigned long page_data_buf_size; >>>>>>>>> unsigned long limit_size; >>>>>>>>> int page_data_num; >>>>>>>>> - int i; >>>>>>>>> + struct page_flag *current; >>>>>>>>> + int i, j; >>>>>>>>> >>>>>>>>> len_buf_out = calculate_len_buf_out(info->page_size); >>>>>>>>> >>>>>>>>> @@ -3560,10 +3561,16 @@ initial_for_parallel() >>>>>>>>> >>>>>>>>> limit_size = (get_free_memory_size() >>>>>>>>> - MAP_REGION * info->num_threads) * 0.6; >>>>>>>>> + if (limit_size < 0) { >>>>>>>>> + MSG("Free memory is not enough for multi-threads\n"); >>>>>>>>> + return FALSE; >>>>>>>>> + } >>>>>>>>> >>>>>>>>> page_data_num = limit_size / page_data_buf_size; >>>>>>>>> + info->num_buffers = 3 * info->num_threads; >>>>>>>>> >>>>>>>>> - info->num_buffers = MIN(NUM_BUFFERS, page_data_num); >>>>>>>>> + info->num_buffers = MAX(info->num_buffers, NUM_BUFFERS); >>>>>>>>> + info->num_buffers = MIN(info->num_buffers, page_data_num); >>>>>>>>> >>>>>>>>> DEBUG_MSG("Number of struct page_data for produce/consume: %d\n", >>>>>>>>> info->num_buffers); >>>>>>>>> @@ -3588,6 +3595,36 @@ initial_for_parallel() >>>>>>>>> } >>>>>>>>> >>>>>>>>> /* >>>>>>>>> + * initial page_flag for each thread >>>>>>>>> + */ >>>>>>>>> + if ((info->page_flag_buf = malloc(sizeof(void *) * info->num_threads)) >>>>>>>>> + == NULL) { >>>>>>>>> + MSG("Can't allocate memory for page_flag_buf. %s\n", >>>>>>>>> + strerror(errno)); >>>>>>>>> + return FALSE; >>>>>>>>> + } >>>>>>>>> + memset(info->page_flag_buf, 0, sizeof(void *) * info->num_threads); >>>>>>>>> + >>>>>>>>> + for (i = 0; i < info->num_threads; i++) { >>>>>>>>> + if ((info->page_flag_buf[i] = calloc(1, sizeof(struct page_flag))) == NULL) { >>>>>>>>> + MSG("Can't allocate memory for page_flag. %s\n", >>>>>>>>> + strerror(errno)); >>>>>>>>> + return FALSE; >>>>>>>>> + } >>>>>>>>> + current = info->page_flag_buf[i]; >>>>>>>>> + >>>>>>>>> + for (j = 1; j < NUM_BUFFERS; j++) { >>>>>>>>> + if ((current->next = calloc(1, sizeof(struct page_flag))) == NULL) { >>>>>>>>> + MSG("Can't allocate memory for page_flag. %s\n", >>>>>>>>> + strerror(errno)); >>>>>>>>> + return FALSE; >>>>>>>>> + } >>>>>>>>> + current = current->next; >>>>>>>>> + } >>>>>>>>> + current->next = info->page_flag_buf[i]; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> * initial fd_memory for threads >>>>>>>>> */ >>>>>>>>> for (i = 0; i < info->num_threads; i++) { >>>>>>>>> @@ -3612,7 +3649,8 @@ initial_for_parallel() >>>>>>>>> void >>>>>>>>> free_for_parallel() >>>>>>>>> { >>>>>>>>> - int i; >>>>>>>>> + int i, j; >>>>>>>>> + struct page_flag *current; >>>>>>>>> >>>>>>>>> if (info->threads != NULL) { >>>>>>>>> for (i = 0; i < info->num_threads; i++) { >>>>>>>>> @@ -3655,6 +3693,19 @@ free_for_parallel() >>>>>>>>> free(info->page_data_buf); >>>>>>>>> } >>>>>>>>> >>>>>>>>> + if (info->page_flag_buf != NULL) { >>>>>>>>> + for (i = 0; i < info->num_threads; i++) { >>>>>>>>> + for (j = 0; j < NUM_BUFFERS; j++) { >>>>>>>>> + if (info->page_flag_buf[i] != NULL) { >>>>>>>>> + current = info->page_flag_buf[i]; >>>>>>>>> + info->page_flag_buf[i] = current->next; >>>>>>>>> + free(current); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + free(info->page_flag_buf); >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> if (info->parallel_info == NULL) >>>>>>>>> return; >>>>>>>>> >>>>>>>>> @@ -7075,11 +7126,11 @@ void * >>>>>>>>> kdump_thread_function_cyclic(void *arg) { >>>>>>>>> void *retval = PTHREAD_FAIL; >>>>>>>>> struct thread_args *kdump_thread_args = (struct thread_args *)arg; >>>>>>>>> - struct page_data *page_data_buf = kdump_thread_args->page_data_buf; >>>>>>>>> + volatile struct page_data *page_data_buf = kdump_thread_args->page_data_buf; >>>>>>>>> + volatile struct page_flag *page_flag_buf = kdump_thread_args->page_flag_buf; >>>>>>>>> struct cycle *cycle = kdump_thread_args->cycle; >>>>>>>>> - int page_data_num = kdump_thread_args->page_data_num; >>>>>>>>> - mdf_pfn_t pfn; >>>>>>>>> - int index; >>>>>>>>> + mdf_pfn_t pfn = cycle->start_pfn; >>>>>>>>> + int index = kdump_thread_args->thread_num; >>>>>>>>> int buf_ready; >>>>>>>>> int dumpable; >>>>>>>>> int fd_memory = 0; >>>>>>>>> @@ -7125,47 +7176,48 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>>> kdump_thread_args->thread_num); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - while (1) { >>>>>>>>> - /* get next pfn */ >>>>>>>>> - pthread_mutex_lock(&info->current_pfn_mutex); >>>>>>>>> - pfn = info->current_pfn; >>>>>>>>> - info->current_pfn++; >>>>>>>>> - pthread_mutex_unlock(&info->current_pfn_mutex); >>>>>>>>> - >>>>>>>>> - if (pfn >= kdump_thread_args->end_pfn) >>>>>>>>> - break; >>>>>>>>> - >>>>>>>>> - index = -1; >>>>>>>>> + /* >>>>>>>>> + * filtered page won't take anything >>>>>>>>> + * unfiltered zero page will only take a page_flag_buf >>>>>>>>> + * unfiltered non-zero page will take a page_flag_buf and a page_data_buf >>>>>>>>> + */ >>>>>>>>> + while (pfn < cycle->end_pfn) { >>>>>>>>> buf_ready = FALSE; >>>>>>>>> >>>>>>>>> + pthread_mutex_lock(&info->page_data_mutex); >>>>>>>>> + while (page_data_buf[index].used != FALSE) { >>>>>>>>> + index = (index + 1) % info->num_buffers; >>>>>>>>> + } >>>>>>>>> + page_data_buf[index].used = TRUE; >>>>>>>>> + pthread_mutex_unlock(&info->page_data_mutex); >>>>>>>>> + >>>>>>>>> while (buf_ready == FALSE) { >>>>>>>>> pthread_testcancel(); >>>>>>>>> - >>>>>>>>> - index = pfn % page_data_num; >>>>>>>>> - >>>>>>>>> - if (pfn - info->consumed_pfn > info->num_buffers) >>>>>>>>> + if (page_flag_buf->ready == FLAG_READY) >>>>>>>>> continue; >>>>>>>>> >>>>>>>>> - if (page_data_buf[index].ready != 0) >>>>>>>>> - continue; >>>>>>>>> - >>>>>>>>> - pthread_mutex_lock(&page_data_buf[index].mutex); >>>>>>>>> - >>>>>>>>> - if (page_data_buf[index].ready != 0) >>>>>>>>> - goto unlock; >>>>>>>>> - >>>>>>>>> - buf_ready = TRUE; >>>>>>>>> + /* get next dumpable pfn */ >>>>>>>>> + pthread_mutex_lock(&info->current_pfn_mutex); >>>>>>>>> + for (pfn = info->current_pfn; pfn < cycle->end_pfn; pfn++) { >>>>>>>>> + dumpable = is_dumpable( >>>>>>>>> + info->fd_bitmap ? &bitmap_parallel : info->bitmap2, >>>>>>>>> + pfn, >>>>>>>>> + cycle); >>>>>>>>> + if (dumpable) >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>>> + info->current_pfn = pfn + 1; >>>>>>>>> >>>>>>>>> - page_data_buf[index].pfn = pfn; >>>>>>>>> - page_data_buf[index].ready = 1; >>>>>>>>> + page_flag_buf->pfn = pfn; >>>>>>>>> + page_flag_buf->ready = FLAG_FILLING; >>>>>>>>> + pthread_mutex_unlock(&info->current_pfn_mutex); >>>>>>>>> + sem_post(&info->page_flag_buf_sem); >>>>>>>>> >>>>>>>>> - dumpable = is_dumpable( >>>>>>>>> - info->fd_bitmap ? &bitmap_parallel : info->bitmap2, >>>>>>>>> - pfn, >>>>>>>>> - cycle); >>>>>>>>> - page_data_buf[index].dumpable = dumpable; >>>>>>>>> - if (!dumpable) >>>>>>>>> - goto unlock; >>>>>>>>> + if (pfn >= cycle->end_pfn) { >>>>>>>>> + info->current_pfn = cycle->end_pfn; >>>>>>>>> + page_data_buf[index].used = FALSE; >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>>> >>>>>>>>> if (!read_pfn_parallel(fd_memory, pfn, buf, >>>>>>>>> &bitmap_memory_parallel, >>>>>>>>> @@ -7178,11 +7230,11 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>>> >>>>>>>>> if ((info->dump_level & DL_EXCLUDE_ZERO) >>>>>>>>> && is_zero_page(buf, info->page_size)) { >>>>>>>>> - page_data_buf[index].zero = TRUE; >>>>>>>>> - goto unlock; >>>>>>>>> + page_flag_buf->zero = TRUE; >>>>>>>>> + goto next; >>>>>>>>> } >>>>>>>>> >>>>>>>>> - page_data_buf[index].zero = FALSE; >>>>>>>>> + page_flag_buf->zero = FALSE; >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * Compress the page data. >>>>>>>>> @@ -7210,6 +7262,7 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>>> page_data_buf[index].flags = >>>>>>>>> DUMP_DH_COMPRESSED_LZO; >>>>>>>>> page_data_buf[index].size = size_out; >>>>>>>>> + >>>>>>>>> memcpy(page_data_buf[index].buf, buf_out, size_out); >>>>>>>>> #endif >>>>>>>>> #ifdef USESNAPPY >>>>>>>>> @@ -7232,12 +7285,14 @@ kdump_thread_function_cyclic(void *arg) { >>>>>>>>> page_data_buf[index].size = info->page_size; >>>>>>>>> memcpy(page_data_buf[index].buf, buf, info->page_size); >>>>>>>>> } >>>>>>>>> -unlock: >>>>>>>>> - pthread_mutex_unlock(&page_data_buf[index].mutex); >>>>>>>>> + page_flag_buf->index = index; >>>>>>>>> + buf_ready = TRUE; >>>>>>>>> +next: >>>>>>>>> + page_flag_buf->ready = FLAG_READY; >>>>>>>>> + page_flag_buf = page_flag_buf->next; >>>>>>>>> >>>>>>>>> } >>>>>>>>> } >>>>>>>>> - >>>>>>>>> retval = NULL; >>>>>>>>> >>>>>>>>> fail: >>>>>>>>> @@ -7265,14 +7320,15 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>>> struct page_desc pd; >>>>>>>>> struct timeval tv_start; >>>>>>>>> struct timeval last, new; >>>>>>>>> - unsigned long long consuming_pfn; >>>>>>>>> pthread_t **threads = NULL; >>>>>>>>> struct thread_args *kdump_thread_args = NULL; >>>>>>>>> void *thread_result; >>>>>>>>> - int page_data_num; >>>>>>>>> + int page_buf_num; >>>>>>>>> struct page_data *page_data_buf = NULL; >>>>>>>>> int i; >>>>>>>>> int index; >>>>>>>>> + int end_count, consuming, check_count; >>>>>>>>> + mdf_pfn_t current_pfn, temp_pfn; >>>>>>>>> >>>>>>>>> if (info->flag_elf_dumpfile) >>>>>>>>> return FALSE; >>>>>>>>> @@ -7284,13 +7340,6 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>>> goto out; >>>>>>>>> } >>>>>>>>> >>>>>>>>> - res = pthread_mutex_init(&info->consumed_pfn_mutex, NULL); >>>>>>>>> - if (res != 0) { >>>>>>>>> - ERRMSG("Can't initialize consumed_pfn_mutex. %s\n", >>>>>>>>> - strerror(res)); >>>>>>>>> - goto out; >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> res = pthread_mutex_init(&info->filter_mutex, NULL); >>>>>>>>> if (res != 0) { >>>>>>>>> ERRMSG("Can't initialize filter_mutex. %s\n", strerror(res)); >>>>>>>>> @@ -7314,36 +7363,23 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>>> end_pfn = cycle->end_pfn; >>>>>>>>> >>>>>>>>> info->current_pfn = start_pfn; >>>>>>>>> - info->consumed_pfn = start_pfn - 1; >>>>>>>>> >>>>>>>>> threads = info->threads; >>>>>>>>> kdump_thread_args = info->kdump_thread_args; >>>>>>>>> >>>>>>>>> - page_data_num = info->num_buffers; >>>>>>>>> + page_buf_num = info->num_buffers; >>>>>>>>> page_data_buf = info->page_data_buf; >>>>>>>>> + pthread_mutex_init(&info->page_data_mutex, NULL); >>>>>>>>> + sem_init(&info->page_flag_buf_sem, 0, 0); >>>>>>>>> >>>>>>>>> - for (i = 0; i < page_data_num; i++) { >>>>>>>>> - /* >>>>>>>>> - * producer will use pfn in page_data_buf to decide the >>>>>>>>> - * consumed pfn >>>>>>>>> - */ >>>>>>>>> - page_data_buf[i].pfn = start_pfn - 1; >>>>>>>>> - page_data_buf[i].ready = 0; >>>>>>>>> - res = pthread_mutex_init(&page_data_buf[i].mutex, NULL); >>>>>>>>> - if (res != 0) { >>>>>>>>> - ERRMSG("Can't initialize mutex of page_data_buf. %s\n", >>>>>>>>> - strerror(res)); >>>>>>>>> - goto out; >>>>>>>>> - } >>>>>>>>> - } >>>>>>>>> + for (i = 0; i < page_buf_num; i++) >>>>>>>>> + page_data_buf[i].used = FALSE; >>>>>>>>> >>>>>>>>> for (i = 0; i < info->num_threads; i++) { >>>>>>>>> kdump_thread_args[i].thread_num = i; >>>>>>>>> kdump_thread_args[i].len_buf_out = len_buf_out; >>>>>>>>> - kdump_thread_args[i].start_pfn = start_pfn; >>>>>>>>> - kdump_thread_args[i].end_pfn = end_pfn; >>>>>>>>> - kdump_thread_args[i].page_data_num = page_data_num; >>>>>>>>> kdump_thread_args[i].page_data_buf = page_data_buf; >>>>>>>>> + kdump_thread_args[i].page_flag_buf = info->page_flag_buf[i]; >>>>>>>>> kdump_thread_args[i].cycle = cycle; >>>>>>>>> >>>>>>>>> res = pthread_create(threads[i], NULL, >>>>>>>>> @@ -7356,55 +7392,88 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> - consuming_pfn = start_pfn; >>>>>>>>> - index = -1; >>>>>>>>> + end_count = 0; >>>>>>>>> + while (1) { >>>>>>>>> + consuming = 0; >>>>>>>>> + check_count = 0; >>>>>>>>> >>>>>>>>> - gettimeofday(&last, NULL); >>>>>>>>> + /* >>>>>>>>> + * The basic idea is producer producing page and consumer writing page. >>>>>>>>> + * Each producer have a page_flag_buf list which is used for storing page's description. >>>>>>>>> + * The size of page_flag_buf is little so it won't take too much memory. >>>>>>>>> + * And all producers will share a page_data_buf array which is used for storing page's compressed data. >>>>>>>>> + * The main thread is the consumer. It will find the next pfn and write it into file. >>>>>>>>> + * The next pfn is smallest pfn in all page_flag_buf. >>>>>>>>> + */ >>>>>>>>> + sem_wait(&info->page_flag_buf_sem); >>>>>>>>> + gettimeofday(&last, NULL); >>>>>>>>> + while (1) { >>>>>>>>> + current_pfn = end_pfn; >>>>>>>>> >>>>>>>>> - while (consuming_pfn < end_pfn) { >>>>>>>>> - index = consuming_pfn % page_data_num; >>>>>>>>> + /* >>>>>>>>> + * page_flag_buf is in circular linked list. >>>>>>>>> + * The array info->page_flag_buf[] records the current page_flag_buf in each thread's >>>>>>>>> + * page_flag_buf list. >>>>>>>>> + * consuming is used for recording in which thread the pfn is the smallest. >>>>>>>>> + * current_pfn is used for recording the value of pfn when checking the pfn. >>>>>>>>> + */ >>>>>>>>> + for (i = 0; i < info->num_threads; i++) { >>>>>>>>> + if (info->page_flag_buf[i]->ready == FLAG_UNUSED) >>>>>>>>> + continue; >>>>>>>>> + temp_pfn = info->page_flag_buf[i]->pfn; >>>>>>>>> >>>>>>>>> - gettimeofday(&new, NULL); >>>>>>>>> - if (new.tv_sec - last.tv_sec > WAIT_TIME) { >>>>>>>>> - ERRMSG("Can't get data of pfn %llx.\n", consuming_pfn); >>>>>>>>> - goto out; >>>>>>>>> - } >>>>>>>>> + /* >>>>>>>>> + * count how many threads have reached the end. >>>>>>>>> + */ >>>>>>>>> + if (temp_pfn >= end_pfn) { >>>>>>>>> + info->page_flag_buf[i]->ready = FLAG_UNUSED; >>>>>>>>> + end_count++; >>>>>>>>> + continue; >>>>>>>>> + } >>>>>>>>> >>>>>>>>> - /* >>>>>>>>> - * check pfn first without mutex locked to reduce the time >>>>>>>>> - * trying to lock the mutex >>>>>>>>> - */ >>>>>>>>> - if (page_data_buf[index].pfn != consuming_pfn) >>>>>>>>> - continue; >>>>>>>>> + if (current_pfn < temp_pfn) >>>>>>>>> + continue; >>>>>>>>> >>>>>>>>> - if (pthread_mutex_trylock(&page_data_buf[index].mutex) != 0) >>>>>>>>> - continue; >>>>>>>>> + check_count++; >>>>>>>>> + consuming = i; >>>>>>>>> + current_pfn = temp_pfn; >>>>>>>>> + } >>>>>>>>> >>>>>>>>> - /* check whether the found one is ready to be consumed */ >>>>>>>>> - if (page_data_buf[index].pfn != consuming_pfn || >>>>>>>>> - page_data_buf[index].ready != 1) { >>>>>>>>> - goto unlock; >>>>>>>>> + /* >>>>>>>>> + * If all the threads have reached the end, we will finish writing. >>>>>>>>> + */ >>>>>>>>> + if (end_count >= info->num_threads) >>>>>>>>> + goto finish; >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> + * If the page_flag_buf is not ready, the pfn recorded may be changed. >>>>>>>>> + * So we should recheck. >>>>>>>>> + */ >>>>>>>>> + if (info->page_flag_buf[consuming]->ready != FLAG_READY) { >>>>>>>>> + gettimeofday(&new, NULL); >>>>>>>>> + if (new.tv_sec - last.tv_sec > WAIT_TIME) { >>>>>>>>> + ERRMSG("Can't get data of pfn.\n"); >>>>>>>>> + goto out; >>>>>>>>> + } >>>>>>>>> + continue; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (current_pfn == info->page_flag_buf[consuming]->pfn) >>>>>>>>> + break; >>>>>>>>> } >>>>>>>>> >>>>>>>>> if ((num_dumped % per) == 0) >>>>>>>>> print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable); >>>>>>>>> >>>>>>>>> - /* next pfn is found, refresh last here */ >>>>>>>>> - last = new; >>>>>>>>> - consuming_pfn++; >>>>>>>>> - info->consumed_pfn++; >>>>>>>>> - page_data_buf[index].ready = 0; >>>>>>>>> - >>>>>>>>> - if (page_data_buf[index].dumpable == FALSE) >>>>>>>>> - goto unlock; >>>>>>>>> - >>>>>>>>> num_dumped++; >>>>>>>>> >>>>>>>>> - if (page_data_buf[index].zero == TRUE) { >>>>>>>>> + >>>>>>>>> + if (info->page_flag_buf[consuming]->zero == TRUE) { >>>>>>>>> if (!write_cache(cd_header, pd_zero, sizeof(page_desc_t))) >>>>>>>>> goto out; >>>>>>>>> pfn_zero++; >>>>>>>>> } else { >>>>>>>>> + index = info->page_flag_buf[consuming]->index; >>>>>>>>> pd.flags = page_data_buf[index].flags; >>>>>>>>> pd.size = page_data_buf[index].size; >>>>>>>>> pd.page_flags = 0; >>>>>>>>> @@ -7420,12 +7489,12 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header, >>>>>>>>> */ >>>>>>>>> if (!write_cache(cd_page, page_data_buf[index].buf, pd.size)) >>>>>>>>> goto out; >>>>>>>>> - >>>>>>>>> + page_data_buf[index].used = FALSE; >>>>>>>>> } >>>>>>>>> -unlock: >>>>>>>>> - pthread_mutex_unlock(&page_data_buf[index].mutex); >>>>>>>>> + info->page_flag_buf[consuming]->ready = FLAG_UNUSED; >>>>>>>>> + info->page_flag_buf[consuming] = info->page_flag_buf[consuming]->next; >>>>>>>>> } >>>>>>>>> - >>>>>>>>> +finish: >>>>>>>>> ret = TRUE; >>>>>>>>> /* >>>>>>>>> * print [100 %] >>>>>>>>> @@ -7463,15 +7532,9 @@ out: >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> - if (page_data_buf != NULL) { >>>>>>>>> - for (i = 0; i < page_data_num; i++) { >>>>>>>>> - pthread_mutex_destroy(&page_data_buf[i].mutex); >>>>>>>>> - } >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> + sem_destroy(&info->page_flag_buf_sem); >>>>>>>>> pthread_rwlock_destroy(&info->usemmap_rwlock); >>>>>>>>> pthread_mutex_destroy(&info->filter_mutex); >>>>>>>>> - pthread_mutex_destroy(&info->consumed_pfn_mutex); >>>>>>>>> pthread_mutex_destroy(&info->current_pfn_mutex); >>>>>>>>> >>>>>>>>> return ret; >>>>>>>>> @@ -7564,6 +7627,7 @@ write_kdump_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_pag >>>>>>>>> num_dumped++; >>>>>>>>> if (!read_pfn(pfn, buf)) >>>>>>>>> goto out; >>>>>>>>> + >>>>>>>>> filter_data_buffer(buf, pfn_to_paddr(pfn), info->page_size); >>>>>>>>> >>>>>>>>> /* >>>>>>>>> diff --git a/makedumpfile.h b/makedumpfile.h >>>>>>>>> index e0b5bbf..4b315c0 100644 >>>>>>>>> --- a/makedumpfile.h >>>>>>>>> +++ b/makedumpfile.h >>>>>>>>> @@ -44,6 +44,7 @@ >>>>>>>>> #include "print_info.h" >>>>>>>>> #include "sadump_mod.h" >>>>>>>>> #include <pthread.h> >>>>>>>>> +#include <semaphore.h> >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * Result of command >>>>>>>>> @@ -977,7 +978,7 @@ typedef unsigned long long int ulonglong; >>>>>>>>> #define PAGE_DATA_NUM (50) >>>>>>>>> #define WAIT_TIME (60 * 10) >>>>>>>>> #define PTHREAD_FAIL ((void *)-2) >>>>>>>>> -#define NUM_BUFFERS (50) >>>>>>>>> +#define NUM_BUFFERS (20) >>>>>>>>> >>>>>>>>> struct mmap_cache { >>>>>>>>> char *mmap_buf; >>>>>>>>> @@ -985,28 +986,33 @@ struct mmap_cache { >>>>>>>>> off_t mmap_end_offset; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> +enum { >>>>>>>>> + FLAG_UNUSED, >>>>>>>>> + FLAG_READY, >>>>>>>>> + FLAG_FILLING >>>>>>>>> +}; >>>>>>>>> +struct page_flag { >>>>>>>>> + mdf_pfn_t pfn; >>>>>>>>> + char zero; >>>>>>>>> + char ready; >>>>>>>>> + short index; >>>>>>>>> + struct page_flag *next; >>>>>>>>> +}; >>>>>>>>> + >>>>>>>>> struct page_data >>>>>>>>> { >>>>>>>>> - mdf_pfn_t pfn; >>>>>>>>> - int dumpable; >>>>>>>>> - int zero; >>>>>>>>> - unsigned int flags; >>>>>>>>> long size; >>>>>>>>> unsigned char *buf; >>>>>>>>> - pthread_mutex_t mutex; >>>>>>>>> - /* >>>>>>>>> - * whether the page_data is ready to be consumed >>>>>>>>> - */ >>>>>>>>> - int ready; >>>>>>>>> + int flags; >>>>>>>>> + int used; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> struct thread_args { >>>>>>>>> int thread_num; >>>>>>>>> unsigned long len_buf_out; >>>>>>>>> - mdf_pfn_t start_pfn, end_pfn; >>>>>>>>> - int page_data_num; >>>>>>>>> struct cycle *cycle; >>>>>>>>> struct page_data *page_data_buf; >>>>>>>>> + struct page_flag *page_flag_buf; >>>>>>>>> }; >>>>>>>>> >>>>>>>>> /* >>>>>>>>> @@ -1295,11 +1301,12 @@ struct DumpInfo { >>>>>>>>> pthread_t **threads; >>>>>>>>> struct thread_args *kdump_thread_args; >>>>>>>>> struct page_data *page_data_buf; >>>>>>>>> + struct page_flag **page_flag_buf; >>>>>>>>> + sem_t page_flag_buf_sem; >>>>>>>>> pthread_rwlock_t usemmap_rwlock; >>>>>>>>> mdf_pfn_t current_pfn; >>>>>>>>> pthread_mutex_t current_pfn_mutex; >>>>>>>>> - mdf_pfn_t consumed_pfn; >>>>>>>>> - pthread_mutex_t consumed_pfn_mutex; >>>>>>>>> + pthread_mutex_t page_data_mutex; >>>>>>>>> pthread_mutex_t filter_mutex; >>>>>>>>> }; >>>>>>>>> extern struct DumpInfo *info; >>>>>>>>> -- >>>>>>>>> 1.8.3.1 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> kexec mailing list >>>>>>>>> kexec at lists.infradead.org >>>>>>>>> http://lists.infradead.org/mailman/listinfo/kexec >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> kexec mailing list >>>>>>> kexec at lists.infradead.org >>>>>>> http://lists.infradead.org/mailman/listinfo/kexec > > >