On Fri, Jul 09, 2021 at 06:39:44AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > -----Original Message----- > > On Mon, Jun 21, 2021 at 06:02:51AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > > > -----Original Message----- > > > > + > > > > + if (read_pd(dd->dfd, offset, &pd)) { > > > > + /* > > > > + * Truncated page descriptor at most > > > > + * references full page. > > > > + */ > > > > + expected_size += block_size; > > > > + goto next; > > > > + } > > > > + > > > > + if (pd.offset == 0) { > > > > + if (!first_empty_pd) > > > > + first_empty_pd = page_idx; > > > > + /* > > > > + * Incomplete pages at most use the > > > > + * whole page. > > > > + */ > > > > + expected_size += block_size; > > > > + } else if (!pd.flags) { > > > > + /* > > > > + * Zero page has no compression flags. > > > > + */ > > > > > > Non-compressed page also has no compression flags. > > > > > > So something like (pd.offset == dd->data_offset) is needed to determine > > > whether the page is zero page? although it's not exact without excluding > > > zero pages. > > > > > > > Hi Kazu, > > > > Yes, you're right. Thanks for spotting it. > > > > I've added a code path for a case when zero pages are not it's excluded. > > It's a no brainer. However, I've got some issues figuring out whether a > > page descriptor references zero page when we start to differentiate > > zero/non-zero uncompressed pages and calculate expected size accordingly. > > > > dd->data_offset points to the beginning of page descriptors and it > > doesn't help to find zero page: > > > > data_offset -> pd for pfn 0 > > pd for pfn 1 > > ... > > pd for pfn N > > ... <-some gap ??? > > zero page > > pfn 0 > > Oh, you're right, I misread something. > > There should be no gap between pd for pfn N and zero page, but anyway > we cannot get the number of pds in advance without counting the bitmap.. > I don't know may be some parts of bitmap weren't flushed either. I'll investigate that further why "valid_pages * descriptor size" is not equal to offset from data_offset to zero page on an incomplete dump. > > > > The essense of what I've got so far is the following: > > > > if (page_idx == 0) > > first_page_offset = pd.offset; > > > > [...] > > } else if (!pd.flags) { > > /* The page is not compressed */ > > if (dd->flags & ZERO_EXCLUDED) { > > /* > > * Zero page is located just before the first > > * dumpable pfn. The following pages are non-zero. > > */ > > if (pd.offset >= first_page_offset) > > expected_size += block_size; > > } else { > > expected_size += block_size; > > } > > } else if (pd.flags) { > > [...] > > > > As you can see it would work but only if pd for pfn 0 doesn't reference > > zero page. Do you have any idea how can we reliably determine zero page > > offset? > > It can be that pd for pfn0 points to zero page, especially considering > makedumpfile --split option. What I thought of are: > > 1. Two-pass check > Count the number of dumpable pages first, calculate the offset of zero page, > then check pd.offset. > Yeah, I also did think of two pass mode but wasn't sure if it's possible to precalculate zero page otherwise :) > 2. Online adjustment, i.e. > if (pd.offset > first_page_offset) // the first page is always regarded as zero page > expected_size += block_size; > else if (pd.offset < first_page_offset) { > first_page_offset = pd.offset // should be zero page > expected_size += block_size; // for the first page missed > } > > but this is a bit tricky and confusing. > > Also, there is need to add one page size for zero page if ZERO_EXCLUDED > either way. > > Personally I think that the former is better, but maybe depending on how > much time is consumed with large dumps. > Agreed, I'll try two pass mode if ZERO_EXCLUDED is enabled for non-compressed pages. The single pass introduced in the series to read page descriptors wasn't noticable on 10G dump. Thanks, Roman > Thanks, > Kazu > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility