On Wed, Feb 01, 2012 at 09:30:57PM -0800, Ben Widawsky wrote: > Hmm, for my curiosity, why doesn't swizzling make sense with 3 channels? > I did some searching and it appears that you're right about no product > shipping with the configuration, but I suspect the comment will help in > case any product ever does. The thing is that gpus are really fond of power-of-two sized dimension. Actually just a even multiple of 64 stride is good enough: This way if you walk a texture in y direction you'll always hit the same memory channel (we switch channels every 64 bytes). So that's why we need swizzling on dual channel (if you draw the pictures for all the swizzle patterns you'll notice that odd-even line pair is on two different channels). Iirc the sampler is also optimize to fetch aligned 2x2 blocks and hence maximizes bandwidth by using both ddr channels. Now 3 never divides a power-of-two, so having 3 channels will naturally result in a channel interleave pattern when walking in y direction. > I'd also say it's not a bad idea to elaborate the assumption that we > never have less than 256MB of memory WARN_ON(dimm_c0 + dimm_c1 == 0). I'm pretty sure that we can't boot with 0mb of ram ;-) The other thing is that swizzling is a pure preformance optimization - if we enable it on setup where it's not required we'll only annoy the memory controller with less efficient access patterns. > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > > index f960738..539ef90 100644 > > --- a/drivers/gpu/drm/i915/i915_reg.h > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > @@ -295,6 +295,12 @@ > > #define FENCE_REG_SANDYBRIDGE_0 0x100000 > > #define SANDYBRIDGE_FENCE_PITCH_SHIFT 32 > > > > +/* control register for cpu gtt access */ > > +#define TILECTL 0x101000 > > +#define TILECTL_SWZCTL (1 << 0) > > +#define TILECTL_TLB_PREFETCH_DIS (1 << 2) > > +#define TILECTL_BACKSNOOP_DIS (1 << 3) > > + > > /* > > * Instruction and interrupt control regs > > */ > > @@ -318,6 +324,11 @@ > > #define RING_MAX_IDLE(base) ((base)+0x54) > > #define RING_HWS_PGA(base) ((base)+0x80) > > #define RING_HWS_PGA_GEN6(base) ((base)+0x2080) > > +#define ARB_MODE 0x04030 > > +#define ARB_MODE_SWIZZLE_SNB (1<<4) > > +#define ARB_MODE_SWIZZLE_IVB (1<<5) > > +#define ARB_MODE_ENABLE(x) GFX_MODE_ENABLE(x) > > +#define ARB_MODE_DISABLE(x) GFX_MODE_DISABLE(x) > > #define RENDER_HWS_PGA_GEN7 (0x04080) > > #define RING_FAULT_REG(ring) (0x4094 + 0x100*(ring)->id) > > #define DONE_REG 0x40b0 > > @@ -1037,6 +1048,29 @@ > > #define C0DRB3 0x10206 > > #define C1DRB3 0x10606 > > > > +/** snb MCH registers for reading the DRAM channel configuration */ > > +#define MAD_DIMM_C0 (MCHBAR_MIRROR_BASE_SNB + 0x5004) > > +#define MAD_DIMM_C1 (MCHBAR_MIRROR_BASE_SNB + 0x5008) > > +#define MAD_DIMM_C2 (MCHBAR_MIRROR_BASE_SNB + 0x500C) > > +#define MAD_DIMM_ECC_MASK (0x3 << 24) > > +#define MAD_DIMM_ECC_OFF (0x0 << 24) > > +#define MAD_DIMM_ECC_IO_ON_LOGIC_OFF (0x1 << 24) > > +#define MAD_DIMM_ECC_IO_OFF_LOGIC_ON (0x2 << 24) > > +#define MAD_DIMM_ECC_ON (0x3 << 24) > > +#define MAD_DIMM_ENH_INTERLEAVE (0x1 << 22) > > +#define MAD_DIMM_RANK_INTERLEAVE (0x1 << 21) > > +#define MAD_DIMM_B_WIDTH_X16 (0x1 << 20) /* X8 chips if unset */ > > +#define MAD_DIMM_A_WIDTH_X16 (0x1 << 19) /* X8 chips if unset */ > > +#define MAD_DIMM_B_DUAL_RANK (0x1 << 18) > > +#define MAD_DIMM_A_DUAL_RANK (0x1 << 17) > > +#define MAD_DIMM_A_SELECT (0x1 << 16) > > +/* DIMM sizes are in multiples of 256mb. */ > > +#define MAD_DIMM_B_SIZE_SHIFT 8 > > +#define MAD_DIMM_B_SIZE_MASK (0xff << MAD_DIMM_B_SIZE_SHIFT) > > +#define MAD_DIMM_A_SIZE_SHIFT 0 > > +#define MAD_DIMM_A_SIZE_MASK (0xff << MAD_DIMM_A_SIZE_SHIFT) > > + > > + > > White space still seems wrong to me, but I don't need to see another > version to verify it's correct. I would have expected: > > ** snb MCH registers for reading the DRAM channel configuration */ > define MAD_DIMM_C0 (MCHBAR_MIRROR_BASE_SNB + 0x5004) > define MAD_DIMM_C1 (MCHBAR_MIRROR_BASE_SNB + 0x5008) > define MAD_DIMM_C2 (MCHBAR_MIRROR_BASE_SNB + 0x500C) > #define MAD_DIMM_ECC_MASK (0x3 << 24) Fixed as you've suggested. > > > /* Clocking configuration register */ > > #define CLKCFG 0x10c00 > > #define CLKCFG_FSB_400 (5 << 0) /* hrawclk 100 */ > > I don't really need to see an updated version of the patch for either > suggestion I submitted. > > Reviewed-by: Ben Widawsky <ben at bwidawsk.net> Thanks for your review. Cheers, Daniel -- Daniel Vetter Mail: daniel at ffwll.ch Mobile: +41 (0)79 365 57 48