On Wed, Nov 20, 2024 at 09:04:20AM +0100, Thomas Hellström wrote: > On Tue, 2024-11-19 at 15:08 -0800, Matthew Brost wrote: > > On Tue, Nov 19, 2024 at 05:45:27PM +0100, Thomas Hellström wrote: > > > On Tue, 2024-10-15 at 20:25 -0700, Matthew Brost wrote: > > > > Add functions which migrate to / from VRAM accepting a single DPA > > > > argument (VRAM) and array of dma addresses (SRAM). > > > > > > > > v2: > > > > - Don't unlock job_mutex in error path of xe_migrate_vram > > > > > > > > Signed-off-by: Oak Zeng <oak.zeng@xxxxxxxxx> > > > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/xe/xe_migrate.c | 149 > > > > ++++++++++++++++++++++++++++++++ > > > > drivers/gpu/drm/xe/xe_migrate.h | 10 +++ > > > > 2 files changed, 159 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c > > > > b/drivers/gpu/drm/xe/xe_migrate.c > > > > index cfd31ae49cc1..d7b6636286ae 100644 > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c > > > > @@ -1542,6 +1542,155 @@ void xe_migrate_wait(struct xe_migrate > > > > *m) > > > > dma_fence_wait(m->fence, false); > > > > } > > > > > > > > +static u32 pte_update_cmd_size(u64 size) > > > > +{ > > > > + u32 dword; > > > > > > dwords or num_dword? > > > > > > > num_dword > > > > > > + u64 entries = DIV_ROUND_UP(size, XE_PAGE_SIZE); > > > > + > > > > + XE_WARN_ON(size > MAX_PREEMPTDISABLE_TRANSFER); > > > > + /* > > > > + * MI_STORE_DATA_IMM command is used to update page > > > > table. > > > > Each > > > > + * instruction can update maximumly 0x1ff pte entries. > > > > To > > > > update > > > > + * n (n <= 0x1ff) pte entries, we need: > > > > + * 1 dword for the MI_STORE_DATA_IMM command header > > > > (opcode > > > > etc) > > > > + * 2 dword for the page table's physical location > > > > + * 2*n dword for value of pte to fill (each pte entry is > > > > 2 > > > > dwords) > > > > + */ > > > > + dword = (1 + 2) * DIV_ROUND_UP(entries, 0x1ff); > > > > + dword += entries * 2; > > > > + > > > > + return dword; > > > > +} > > > > + > > > > +static void build_pt_update_batch_sram(struct xe_migrate *m, > > > > + struct xe_bb *bb, u32 > > > > pt_offset, > > > > + dma_addr_t *sram_addr, > > > > u32 > > > > size) > > > > +{ > > > > + u16 pat_index = tile_to_xe(m->tile)- > > > > >pat.idx[XE_CACHE_WB]; > > > > + u32 ptes; > > > > + int i = 0; > > > > + > > > > + ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE); > > > > + while (ptes) { > > > > + u32 chunk = min(0x1ffU, ptes); > > > > + > > > > + bb->cs[bb->len++] = MI_STORE_DATA_IMM | > > > > MI_SDI_NUM_QW(chunk); > > > > + bb->cs[bb->len++] = pt_offset; > > > > + bb->cs[bb->len++] = 0; > > > > + > > > > + pt_offset += chunk * 8; > > > > + ptes -= chunk; > > > > + > > > > + while (chunk--) { > > > > + u64 addr = sram_addr[i++] & PAGE_MASK; > > > > + > > > > + xe_tile_assert(m->tile, addr); > > > > + addr = m->q->vm->pt_ops- > > > > >pte_encode_addr(m- > > > > > tile->xe, > > > > + > > > > addr, pat_index, > > > > + > > > > 0, > > > > false, 0); > > > > + bb->cs[bb->len++] = lower_32_bits(addr); > > > > + bb->cs[bb->len++] = upper_32_bits(addr); > > > > + } > > > > + } > > > > +} > > > > + > > > > +enum xe_migrate_copy_dir { > > > > + XE_MIGRATE_COPY_TO_VRAM, > > > > + XE_MIGRATE_COPY_TO_SRAM, > > > > +}; > > > > + > > > > +static struct dma_fence *xe_migrate_vram(struct xe_migrate *m, > > > > + unsigned long npages, > > > > + dma_addr_t *sram_addr, > > > > u64 > > > > vram_addr, > > > > + const enum > > > > xe_migrate_copy_dir dir) > > > > +{ > > > > + struct xe_gt *gt = m->tile->primary_gt; > > > > + struct xe_device *xe = gt_to_xe(gt); > > > > + struct dma_fence *fence = NULL; > > > > + u32 batch_size = 2; > > > > + u64 src_L0_ofs, dst_L0_ofs; > > > > + u64 round_update_size; > > > > + struct xe_sched_job *job; > > > > + struct xe_bb *bb; > > > > + u32 update_idx, pt_slot = 0; > > > > + int err; > > > > + > > > > + round_update_size = min_t(u64, npages * PAGE_SIZE, > > > > + MAX_PREEMPTDISABLE_TRANSFER); > > > > > > Hm. How does the caller know how many pages were actually migrated? > > > > > > > This is an intermediate between migrate_vma_setup and > > migrate_vma_pages/finalize. The number of pages here is based on mpfn > > returned from migrate_vma_setup. The migration for individual pages > > may > > still be aborted in migrate_vma_pages/finalize. In this case both the > > old and new page have the same data, dso migrate_vma_pages/finalize > > can > > pick either page. > > I might be misunderstanding, but I meant if npages is, for example, > which is 16MiB of data, but the above min_t reduces that to 8MiB of > data. How would the caller know? > Oh, yea - that is broken - it kinda assumes a chunk is 8M or less. I had some local patches which fixed this function to do a loop, will pull those into the next rev to future proof this. Matt > > /Thomas >