Re: [PATCH] MIPS: MSA: misaligned support

Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx> · Wed, 18 Mar 2015 12:46:51 -0700




On 03/18/2015 04:27 AM, James Hogan wrote:

+	.align  4
doesn't this mean the first one & label might not be suitably aligned.
Would it be better to put this before the ld_d (no need for it after
$w31 case) and putting another .align 4 before the Lmsa_to and Lmsa_from
labels (so the label itself is aligned)?

Good point! I will issue V2.


+
+	case 2: /* word */
+		to->val32[0] = from->val32[1];
+		to->val32[1] = from->val32[0];
+		to->val32[2] = from->val32[3];
+		to->val32[3] = from->val32[2];
FWIW since the FP/MSA patches that Paul submitted, there are also
working endian agnostic accessors created with BUILD_FPR_ACCESS, which
use the FPR_IDX macro (see http://patchwork.linux-mips.org/patch/9169/),
which should work for 8bit and 16bit sizes too.

I wonder if the compiler would unroll/optimise this sort of thing:
	for (i = 0; i < (FPU_REG_WIDTH / 8); ++i)
		to_val8[i] = from->val[FPR_IDX(8, i)];

No worries if not.

There is a simple logic behind it - any code rolling in macro/subroutine 
definitely
has sense if any of that is correct:

- code can't be observed by single look (more than half page or window)
- there is a chance to reuse some part of it
- some code vars/limits/bit/positions/etc can be changed in future
- some logical connection to macro/subroutine is desirable to take attn 
during code maintenance.

None of it besides last one is true here. Logical connection to union 
fpureg is done. Changing of MSA register size definitely causes a lot of 
other changes in logic and many assumptions may be broken in multiple 
places, including process signal conventions.

Rolling code in macro (FPR_IDX) and turning it into loop from pretty 
short assignment actually INCREASES a code complexity for future 
maintenance.



+		break;
+
+	case 3: /* doubleword, no conversion */
+		break;
don't you still need to copy the value though?

Good point! Never test this subroutine yet, HW team still should produce 
BIG ENDIAN CPU+MSA variant.
I will issue V2.


+	}
+}
+#endif
+#endif
+
  static void emulate_load_store_insn(struct pt_regs *regs,
  	void __user *addr, unsigned int __user *pc)
  {
@@ -434,6 +497,10 @@ static void emulate_load_store_insn(struct pt_regs *regs,
  #ifdef	CONFIG_EVA
  	mm_segment_t seg;
  #endif
+#ifdef CONFIG_CPU_HAS_MSA
+	union fpureg msadatabase[2], *msadata;
+	unsigned int func, df, rs, wd;
+#endif
  	origpc = (unsigned long)pc;
  	orig31 = regs->regs[31];
  
@@ -703,6 +770,82 @@ static void emulate_load_store_insn(struct pt_regs *regs,
  			break;
  		return;
  
+#ifdef CONFIG_CPU_HAS_MSA
+	case msa_op:
+		if (cpu_has_mdmx)
+			goto sigill;
+
+		func = insn.msa_mi10_format.func;
+		switch (func) {
+		default:
+			goto sigbus;
+
+		case msa_ld_op:
+		case msa_st_op:
+			;
+		}
+
+		if (!thread_msa_context_live())
+			goto sigbus;
Will this ever happen? (I can't see AdE handler enabling interrupts).

If the MSA context genuinely isn't live (i.e. it can be considered
UNPREDICTABLE), then surely a load operation should still succeed?

thread_msa_context_live() == check of TIF_MSA_CTX_LIVE == existence of 
MSA context for thread.
It differs from MSA is owned by thread, it just says that thread has 
already initialized MSA.

Unfortunate choice of function name, I believe.

This is a guard against bad selection of exception priorities in HW... 
sometime it happens.


+
+		df = insn.msa_mi10_format.df;
+		rs = insn.msa_mi10_format.rs;
+		wd = insn.msa_mi10_format.wd;
+		addr = (unsigned long *)(regs->regs[rs] + (insn.msa_mi10_format.s10 * (1 << df)));
"* (1 << df)"?
why not just "<< df"?

+		/* align a working space in stack... */
+		msadata = (union fpureg *)(((unsigned long)msadatabase + 15) & ~(unsigned long)0xf);
Maybe you could just use __aligned(16) on a single local union fpureg.

I am not sure that it works in stack. I don't trust toolchain here - 
they even are not able to align a frame in stack to 16bytes.


+			}
+		} else {
+			if (!access_ok(VERIFY_WRITE, addr, 16))
+				goto sigbus;
+			compute_return_epc(regs);
forgot to preempt_disable()?

Yes.


+			if (test_thread_flag(TIF_USEDMSA)) {
+#ifdef __BIG_ENDIAN
+				msa_from_wd(wd, &current->thread.fpu.fpr[wd]);
+				msa_convert(msadata, &current->thread.fpu.fpr[wd], df);
+#else
+				msa_from_wd(wd, msadata);
+#endif
+				preempt_enable();
+			} else {
+				preempt_enable();
+#ifdef __BIG_ENDIAN
+				msa_convert(msadata, &current->thread.fpu.fpr[wd], df);
+#else
+				*msadata = current->thread.fpu.fpr[wd];
hmm, you could cheat and change this to the following?:
				msadata = &current->thread.fpu.fpr[wd];
Yes. But has it sense? It is just 2 doublewords is replaced by single 
PTR assignment. However, if msadata is not changed it gives a compiler 
some room for optimization.

- Leonid.