Hello Petr, On Thu, 6 Sep 2012 17:50:52 +0200 Petr Tesarik <ptesarik at suse.cz> wrote: > Dne Po 3. z??? 2012 09:04:03 Petr Tesarik napsal(a): > > Dne Po 3. z??? 2012 05:42:33 Atsushi Kumagai napsal(a): > > > Hello Petr, > > > > > > On Tue, 28 Aug 2012 19:49:49 +0200 > > > > > > Petr Tesarik <ptesarik at suse.cz> wrote: > > > > Add a simple cache for pages read from the dumpfile. > > > > > > > > This is a big win if we read consecutive data from one page, e.g. > > > > page descriptors, or even page table entries. > > > > > > > > Note that makedumpfile now always reads a complete page. This was > > > > already the case with kdump-compressed and sadump formats, but > > > > makedumpfile was throwing most of the data away. For the > > > > kdump-compressed case, we may actually save a lot of decompression, > > > > too. > > > > > > > > I tried to keep the cache small to minimize memory footprint, but it > > > > should be big enough to hold all pages to do 4-level paging plus some > > > > data. This is needed e.g. for vmalloc areas or Xen page frame table > > > > data, which are not contiguous in physical memory. > > > > > > > > Signed-off-by: Petr Tesarik <ptesarik at suse.cz> Sorry for the late reply. According to your measurement, it looks good on performance. However, I found the issue below in v1.5.1-beta and made sure that this patch causes it by git bisect (but I don't find the true cause yet). result on kernel 3.4: $ makedumpfile --non-cyclic vmcore dumpfile Copying data : [ 62 %] readpage_elf: Can't convert a physical address(a0000) to offset. readmem: type_addr: 1, addr:1000a0000, size:4096 read_pfn: Can't get the page data. makedumpfile Failed. $ It seems critical issue for all users, so I will postpone merging this patch until this issue is solved. Thanks Atsushi Kumagai > > > > > > It's interesting to me. I want to know how performance will be improved > > > with this patch, so do you have speed measurements ? > > > > Not really. I only measured the hit/miss ratio, and with filtering Xen domU > > and dump level 0, I got the following on a small system (2G RAM): > > > > cache hit: 1818880 cache miss: 1873 > > > > The improvement isn't much for non-Xen case, because the hits are mostly > > due to virtual-to-physical translations, and most of Linux data is stored > > at virtual addresses that can be resolved by adding/subtracting a fixed > > offset. > > > > Of course, you will also win only the syscall overhead, because Linux keeps > > the data in the kernel pagecache anyway. I'll measure the times for you on > > a reasonably large system (~256G) and send the results here. > > I couldn't get a medium-sized system for testing, so I performed some > measurements on a 64G system. I ran makedumpfile repeatedly from the kdump > environment. First run was used to cache target filesystem metadata, and the > cache was not dropped between runs to minimize effects of the target > filesystem. I ran it against /proc/vmcore, i.e. the input file was always > resident, nothing to skew the results. > > I tried with a kdump file with no compression (to get gzip/LZO out of the > picture) and an ELF file. For the Xen case I only did the ELF file, because > kdump is not available. > > First I ran it on bare metal. There was a slight improvement for -d31: > > kdump no cache: > 6.32user 55.20system 1:15.60elapsed 81%CPU (0avgtext+0avgdata > 4800maxresident)k > 2080inputs+5714296outputs (2major+342minor)pagefaults 0swaps > > kdump with cache: > 6.02user 24.58system 0:46.51elapsed 65%CPU (0avgtext+0avgdata > 4912maxresident)k > 1864inputs+5714288outputs (2major+350minor)pagefaults 0swaps > > ELF no cache: > 7.58user 74.25system 1:59.52elapsed 68%CPU (0avgtext+0avgdata > 4800maxresident)k > 728inputs+9288824outputs (1major+342minor)pagefaults 0swaps > > ELF with cache: > 7.43user 44.21system 1:17.41elapsed 66%CPU (0avgtext+0avgdata > 4896maxresident)k > 728inputs+9288792outputs (1major+349minor)pagefaults 0swaps > > To sum it up, I can see an improvement of approx. 50% in system time. The > increase in memory consumption is a bit more than I would expect (why do I see > ~100k for a cache of 12k?), but acceptable nevertheless. I can see a slight > increase in user time (approx. 25%) for the kdump case, which could be > attributed to the cache overhead. I don't have any explanation for the > decreased user time for the ELF case, but it's consistent. > > I also tried running makedumpfile with -d1. This results in long sequential > reads, so it's the worst case for a simple LRU-policy cache. The results are > too unstable to make a reliable measurement, but there seems to be a slight > performance hit. It is certainly less than 5% total time. > > I think there are two reasons for that: > > 1. We're copying file data twice for each page (once from the kernel page > cache to the process space, and once from the internal cache to the > destination). > 2. Instead of reusing the same data location, we're rotating 8 different pages > (or even up to twice as much if the allocated space is neither continuous nor > page-aligned). This stresses both for the CPU's L1 d-cache and the TLB a tiny > bit more. Note that in the /proc/vmcore case, the kernel sequentially maps all > physical memory of the crashed system, so every cache page may be evicted > before we get to using it again. This could explain why I observe an increase > in system time despite making less system calls. > > There's a lot of things I could do to regain the old performance, if anybody > is concerned about the slight performance regression for this worst case. Just > let me know. > > Second, I ran with the Xen hypervisor. Since dump levels greater than 1 don't > work, I ran with '-E -X -d1'. Even though this includes the inefficient page > walk described above, the improvement was immense. > > no cache: > 95.33user 657.18system 13:08.40elapsed 95%CPU (0avgtext+0avgdata > 5440maxresident)k > 704inputs+6563856outputs (1major+388minor)pagefaults 0swaps > > with cache: > 61.14user 110.15system 3:24.24elapsed 83%CPU (0avgtext+0avgdata > 5584maxresident)k > 2360inputs+6563872outputs (2major+396minor)pagefaults 0swaps > > In short, almost 80% shorter total time. > > Petr Tesarik > SUSE Linux > > _______________________________________________ > kexec mailing list > kexec at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec