Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

Amber Lin <Amber.Lin@xxxxxxx> · Wed, 15 Apr 2020 09:36:42 -0400



    Thank you Felix. Now I understand the problem of
        global client ID is leaking a hole for potential attackers. I
        didn't take that into consideration. I'll change that following
        your advice below.

        
        Hi Alex,

        
        Thank you for the link. It's helpful. I have a question
        regarding the versioning. One topic in the article talks about
        how the userspace can figure out if the new ioctl is supported
        in a given kernel. Is it correct that with dkms driver, we use
        the driver version coming from AMDGPU_VERSION in amdgpu_drv.c,
        and in upstream kernel we use the kernel version?

        
        Thanks.

        
        Amber

      
    On 2020-04-14 11:03 p.m., Deucher,
      Alexander wrote:

    
        [AMD Public Use]

      
          Some good advice on getting ioctls right:
        
          https://www.kernel.org/doc/html/v5.4-preprc-cpu/ioctl/botching-up-ioctls.html
        
          
          Alex

        
        From:
            amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on
            behalf of Felix Kuehling <felix.kuehling@xxxxxxx>

            Sent: Tuesday, April 14, 2020 10:40 PM

            To: Lin, Amber <Amber.Lin@xxxxxxx>;
            amd-gfx@xxxxxxxxxxxxxxxxxxxxx
            <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>

            Subject: Re: [PATCH v4] drm/amdkfd: Provide SMI
            events watch
           
        
          Hi Amber,
          I understand that different processes can get the same FD.
            My statement about FD being unique is relative to one
            process.
          The main problem with the global client ID is, that it
            allows process A to change the event mask of process B just
            by specifying process B's client ID. That can lead to denial
            of service attacks where process A can cause events not to
            be delivered to B or can flood process B with frequent
            events that it's not prepared to handle.
          Therefore you must make the lookup of the client from the
            client ID not from a global list, but from a per-process
            list. That way process A can only change event masks of
            process A clients, and not those of any other process.
          But if the client list is process-specific, you can use the
            FD as a unique identifier of the client within the process,
            so you don't need a separate client ID.
          Regards,

              Felix

          
          Am 2020-04-14 um 8:09 p.m.
            schrieb Lin, Amber:

          
              [AMD Official Use Only - Internal
                  Distribution Only]
               
              Hi Felix,
               
              That was my assumption too that
                each registration will get different file descriptor,
                but it turns out not. When I started two process and
                both register gpu0 and gpu1, they both got fd=15. If I
                have process A register gpu0+gpu1, and process B only
                register gpu0, process A gets fd=15 and process B gets
                fd=9. That’s why I added client ID.
               
              By multiple clients, I mean
                multiple processes. The ask is users want to have
                multiple tools and those different tools can use rsmi
                lib to watch events at the same time. Due to the reason
                above that two processes can actually get the same fd
                and I need to add client ID to distinguish the
                registration, I don’t see the point of limiting one
                registration per process unless I use pid to distinguish
                the client instead, which was in my consideration too
                when I was writing the code. But second thought is why
                adding this restriction when client ID can allow the
                tool to watch different events on different devices if
                they want to. Maybe client ID is a bad term and it
                misleads you. I should call it register ID.
               
              
                Regards,
                Amber
              
               
                  From: Kuehling, Felix 
                      <Felix.Kuehling@xxxxxxx> 

                    Sent: Tuesday, April 14, 2020 7:04 PM

                    To: Lin, Amber 
                      <Amber.Lin@xxxxxxx>; 
                      amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                    Subject: Re: [PATCH v4] drm/amdkfd: Provide
                    SMI events watch
                
              
              Hi Amber,
              Some general remarks about the multi-client support.
                You added a global client id that's separate from the
                file descriptor. That's problematic for two reasons:
              
                A process could change
                  a different process' event mask
                The FD should already
                  be unique per process, no need to invent another ID
              
              If we want to allow one process to register for events
                multiple times (multiple FDs per process), then the list
                of clients should be per process. Each process should
                only be allowed to change the event masks of its own
                clients. The client could be identified by its FD. No
                need for another client ID.
              But you could also simplify it further by allowing only
                one event client per process. Then you don't need the
                client ID lookup at all. Just have a single event client
                in the kfd_process.
              Another approach would be to make enable/disable
                functions of the event FD, rather than the KFD FD ioctl.
                It could be an ioctl of the event FD, or even simpler,
                you could use the write file-operation to write an event
                mask (of arbitrary length if you want to enable growth
                in the future). That way everything would be neatly
                encapsulated in the event FD private data.
              Two more comments inline ...
               
              
                Am 2020-04-14 um 5:30 p.m.
                  schrieb Amber Lin:
              
              
                When the compute is malfunctioning or performance drops, the system admin
                will use SMI (System Management Interface) tool to monitor/diagnostic what
                went wrong. This patch provides an event watch interface for the user
                space to register devices and subscribe events they are interested. After
                registered, the user can use annoymous file descriptor's poll function
                with wait-time specified and wait for events to happen. Once an event
                happens, the user can use read() to retrieve information related to the
                event.
                 
                VM fault event is done in this patch.
                 
                v2: - remove UNREGISTER and add event ENABLE/DISABLE
                    - correct kfifo usage
                    - move event message API to kfd_ioctl.h
                v3: send the event msg in text than in binary
                v4: support multiple clients
                 
                Signed-off-by: Amber Lin <Amber.Lin@xxxxxxx>
              
              [snip]
              
                diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
                index 4f66764..8146437 100644
                --- a/include/uapi/linux/kfd_ioctl.h
                +++ b/include/uapi/linux/kfd_ioctl.h
                @@ -442,6 +442,36 @@ struct kfd_ioctl_import_dmabuf_args {
                  __u32 dmabuf_fd;       /* to KFD */
                 };
                 
                +/*
                + * KFD SMI(System Management Interface) events
                + */
                +enum kfd_smi_events_op {
                + KFD_SMI_EVENTS_REGISTER = 1,
                + KFD_SMI_EVENTS_ENABLE,
                + KFD_SMI_EVENTS_DISABLE
                +};
                +
                +/* Event type (defined by bitmask) */
                +#define KFD_SMI_EVENT_VMFAULT     0x0000000000000001
                +
                +struct kfd_ioctl_smi_events_args {
                + __u32 op;              /* to KFD */
                + __u64 events;          /* to KFD */
              
              The binary layout of the ioctl args structure should be
                the same on 32/64-bit. That means the 64-bit members
                should be 64-bit aligned. The best way to ensure this is
                to put all the 64-bit members first.
               
              
                + __u64 gpuids_array_ptr;        /* to KFD */
                + __u32 num_gpuids;      /* to KFD */
                + __u32 anon_fd;         /* from KFD */
                + __u32 client_id;       /* to/from KFD */
                +};
                +
                +/* 1. All messages must start with (hex)uint64_event(16) + space(1) +
                + *    (hex)gpuid(8) + space(1) =  26 bytes
                + * 2. VmFault msg = (hex)uint32_pid(8) + space(1) + task name(16) = 25
                + *    When a new event msg uses more memory, change the calculation here.
                + * 3. End with \n(1)
                + * 26 + 25 + 1 = 52
                + */
                +#define KFD_SMI_MAX_EVENT_MSG 52
              
              If you define the maximum message length here, clients
                may start depending on it, and it gets harder to change
                it later. I'd not define this in the API header. It's
                not necessary to write correct clients. And if used
                badly, it may encourage writing incorrect clients that
                break with longer messages in the future.
              Regards,

                  Felix
               
              
                +
                 /* Register offset inside the remapped mmio page
                  */
                 enum kfd_mmio_remap {
                @@ -546,7 +576,10 @@ enum kfd_mmio_remap {
                 #define AMDKFD_IOC_ALLOC_QUEUE_GWS            \
                         AMDKFD_IOWR(0x1E, struct kfd_ioctl_alloc_queue_gws_args)
                 
                +#define AMDKFD_IOC_SMI_EVENTS                 \
                +        AMDKFD_IOWR(0x1F, struct kfd_ioctl_smi_events_args)
                +
                 #define AMDKFD_COMMAND_START           0x01
                -#define AMDKFD_COMMAND_END             0x1F
                +#define AMDKFD_COMMAND_END             0x20
                 
                 #endif
              
            
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx