>> 在 2012年2月15日 下午11:53,Jerome Glisse <j.glisse@xxxxxxxxx> 写道: >>> To me it looks like the CP is trying to fetch memory but the >>> GPU memory controller fail to fullfill cp request. Did you >>> check the PCI configuration before & after (when things don't >>> work) My best guest is PCI bus mastering is no properly working >>> or the PCIE GPU gart table as wrong data. >>> >>> Maybe one need to drop bus master and reenable bus master to >>> work around some bug... >> Thanks for your suggestion. We've tried the 'drop and reenable master' >> trick, unfortunately doesn't work. >> The PCI configuration compare will be done later. > Update: We've checked the first 64 bytes of PCI configuration space > before & after, and didn't find any difference. Hi, Status update: We try to analyze the GPU instruction stream when lockup today. The lockup always occurs after tasks restarting, so the related instructions should reside at ib, as pointed by dmesg: [ 2456.585937] GPU lockup (waiting for 0x0002F98B last fence id 0x0002F98A) Print instructions in related ib: [ 2462.492187] PM4 block 10 has 115 instructions, with fence seq 2f98b .... [ 2462.976562] Type3:PACKET3_SET_CONTEXT_REG ref_addr <not interpreted> [ 2462.984375] Type3:PACKET3_SET_CONTEXT_REG ref_addr <not interpreted> [ 2462.988281] Type3:PACKET3_SET_CONTEXT_REG ref_addr <not interpreted> [ 2462.992187] Type3:PACKET3_SET_ALU_CONST ref_addr <not interpreted> [ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880 [ 2463.003906] Type3:PACKET3_SET_RESOURCE ref_addr <not interpreted> [ 2463.007812] Type3:PACKET3_SET_CONFIG_REG ref_addr <not interpreted> [ 2463.011718] Type3:PACKET3_INDEX_TYPE ref_addr <not interpreted> [ 2463.015625] Type3:PACKET3_NUM_INSTANCES ref_addr <not interpreted> [ 2463.019531] Type3:PACKET3_DRAW_INDEX_AUTO ref_addr <not interpreted> [ 2463.027343] Type3:PACKET3_EVENT_WRITE ref_addr <not interpreted> [ 2463.031250] Type3:PACKET3_SET_CONFIG_REG ref_addr <not interpreted> [ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680 [ 2463.039062] Type3:PACKET3_SET_CONTEXT_REG ref_addr <not interpreted> [ 2463.046875] Type3:PACKET3_SET_CONTEXT_REG ref_addr <not interpreted> [ 2463.050781] Type3:PACKET3_SET_CONTEXT_REG ref_addr <not interpreted> [ 2463.054687] Type3:PACKET3_SET_BOOL_CONST ref_addr <not interpreted> [ 2463.062500] Type3:PACKET3_SURFACE_SYNC ref_addr 10668e CP_COHER_BASE was 0x0018C880, so the instruction which caused lockup should be in: [ 2462.996093] Type3:PACKET3_SURFACE_SYNC ref_addr 18c880 ... [ 2463.035156] Type3:PACKET3_SURFACE_SYNC ref_addr 10f680 Here, only SURFACE_SYNC, SET_RESOURCE and EVENT_WRITE will access GPU memory. We guess it maybe SURFACE_SYNC? BTW, when lockup happens, if places the CP ring at vram, ring_test will pass, but ib_test fails -- which suggests ME fails to feed CP when lockup? May a former SURFACE_SYNC block the MC? P.S. We hack to place CP ring, ib and ih at vram and disable wb(radeon_no_wb=1) in today's debugging. Any idea? Regards, -- Chen Jie _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel