On 30 December 2013 15:03, Richard Henderson <rth@xxxxxxxxxxx> wrote: > On 12/28/2013 01:49 PM, Peter Maydell wrote: >> if (size < 4) { >> switch (size) { >> case 0: >> - tcg_gen_ld8u_i64(tmp, cpu_env, freg_offs); >> + tcg_gen_ld8u_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_8)); >> break; >> case 1: >> - tcg_gen_ld16u_i64(tmp, cpu_env, freg_offs); >> + tcg_gen_ld16u_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_16)); >> break; >> case 2: >> - tcg_gen_ld32u_i64(tmp, cpu_env, freg_offs); >> + tcg_gen_ld32u_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_32)); >> break; >> case 3: >> - tcg_gen_ld_i64(tmp, cpu_env, freg_offs); >> + tcg_gen_ld_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_64)); >> break; >> } >> tcg_gen_qemu_st_i64(tmp, tcg_addr, get_mem_index(s), MO_TE + size); > > It occurs to me to wonder whether it wouldn't just be better to load the whole > 64-bit quantity and store the piece we need, ignoring the entire host-endian issue. Yeah, we could do that. Will the optimiser optimise away the unnecessary extra load of the unused high 32 bits for the "32 bit or smaller" case on a 32 bit host CPU? thanks -- PMM _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm