RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: David Howells
> Sent: Friday, August 18, 2023 5:49 PM
> 
> David Laight <David.Laight@xxxxxxxxxx> wrote:
> 
> > > iov_iter_init                            inc 0x27 -> 0x31 +0xa
> >
> > Are you hitting the gcc bug that loads the constant from memory?
> 
> I'm not sure what that looks like.  For your perusal, here's a disassembly of
> the use-switch-on-enum variant:
> 
>    0xffffffff8177726c <+0>:     cmp    $0x1,%esi
>    0xffffffff8177726f <+3>:     jbe    0xffffffff81777273 <iov_iter_init+7>
>    0xffffffff81777271 <+5>:     ud2
>    0xffffffff81777273 <+7>:     test   %esi,%esi
>    0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
>    0xffffffff8177727a <+14>:    setne  0x3(%rdi)
>    0xffffffff8177727e <+18>:    xor    %eax,%eax
>    0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
>    0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
>    0xffffffff81777288 <+28>:    mov    %rax,0x8(%rdi)
>    0xffffffff8177728c <+32>:    mov    %rdx,0x10(%rdi)
>    0xffffffff81777290 <+36>:    mov    %r8,0x18(%rdi)
>    0xffffffff81777294 <+40>:    mov    %rcx,0x20(%rdi)
>    0xffffffff81777298 <+44>:    jmp    0xffffffff81d728a0 <__x86_return_thunk>
> 
> versus the use-bitmap variant:
> 
>    0xffffffff81777311 <+0>:     cmp    $0x1,%esi
>    0xffffffff81777314 <+3>:     jbe    0xffffffff81777318 <iov_iter_init+7>
>    0xffffffff81777316 <+5>:     ud2
>    0xffffffff81777318 <+7>:     test   %esi,%esi
>    0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
>    0xffffffff8177731d <+12>:    setne  0x1(%rdi)
>    0xffffffff81777321 <+16>:    xor    %eax,%eax
>    0xffffffff81777323 <+18>:    mov    %rdx,0x10(%rdi)
>    0xffffffff81777327 <+22>:    mov    %rax,0x8(%rdi)
>    0xffffffff8177732b <+26>:    mov    %r8,0x18(%rdi)
>    0xffffffff8177732f <+30>:    mov    %rcx,0x20(%rdi)
>    0xffffffff81777333 <+34>:    jmp    0xffffffff81d72960 <__x86_return_thunk>
> 
> It seems to be that the former is loading byte constants individually, whereas
> Linus combined all those fields into a single byte and eliminated one of them.

I think you need to re-order the structure.
The top set writes to bytes 0..4 with:
>    0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
>    0xffffffff8177727a <+14>:    setne  0x3(%rdi)
>    0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
>    0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
Note that the 'setne' writes into the middle of the constants.

The lower writes bytes 0..1 with:
>    0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
>    0xffffffff8177731d <+12>:    setne  0x1(%rdi)

I think that if you move the 'conditional' value to offset 4
you'll get fewer writes.
Probably a 32bit load into %eax and then a write.

I don't think gcc likes generating 16bit immediates.
In some tests I did it loaded a 32bit value into %eax
and then wrote the low bits.
So the code is much the same (on x86) for 2 or 4 bytes
of constants.
I'm sure you can use the 'data-16' prefix with an immediate.

I'm not sure why you have two non-zero values when Linus
only had one though.

OTOH you don't want to be writing 3 bytes of constants.
Also gcc won't generate:
	movl $0xaabbccdd,%eax
	setne %al   // overwriting the dd
	movl %eax,(%rdi)
and I suspect the partial write (to %al) will be a stall.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux