Re: [EXTERNAL] Re: Code Review Request for AMDGPU Hotplug Support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you attach dmesg for the failure without your patch against amd-staging-drm-next ?

Also, in general, patches for amdgpu upstream branches should be submitted to amd-gfx mailing list inline using git-send which makes it easy to comment and review them inline.

Andrey

On 2022-04-06 10:25, Shuotao Xu wrote:
Hi Andrey,

We just tried kernel 5.16 based on https://gitlab.freedesktop.org/agd5f/linux.git <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux.git&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C86a376e9139548aab4ca08da17d9621f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637848519676249428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=wdPzJJBPVGWulUhyAyaI1Jtq4uD%2B80aBo7PDBpIjmQM%3D&reserved=0> amd-staging-drm-next branch, and found out that hotplug did not work out of box for Rocm compute stack.

We did not try the rendering stack since we currently are more focused on AI workloads.

We have also created a patch against the amd-staging-drm-next branch to enable hotplug for ROCM stack, which were sent in another later email with same subject. I am attaching the patch in this email, in case that you would want to delete that later email.

Best regards,

Shuotao

*From: *Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
*Date: *Wednesday, April 6, 2022 at 10:13 PM
*To: *Shuotao Xu <shuotaoxu@xxxxxxxxxxxxx>, amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx> *Cc: *Ziyue Yang <Ziyue.Yang@xxxxxxxxxxxxx>, Lei Qu <Lei.Qu@xxxxxxxxxxxxx>, Peng Cheng <pengc@xxxxxxxxxxxxx>, Ran Shu <Ran.Shu@xxxxxxxxxxxxx>
*Subject: *[EXTERNAL] Re: Code Review Request for AMDGPU Hotplug Support

[You don't often get email from andrey.grodzovsky@xxxxxxx. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Faka.ms%2FLearnAboutSenderIdentification.%255d&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C86a376e9139548aab4ca08da17d9621f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637848519676249428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5VSq8jQN%2FXrj0SG%2B7Tv%2Bz29O0pE3eb9CUevGBiX1Bxo%3D&reserved=0>

Looks like you are using 5.13 kernel for this work, FYI we added
hot plug support for the graphic stack in 5.14 kernel (see
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.phoronix.com%2Fscan.php%3Fpage%3Dnews_item%26px%3DLinux-5.14-AMDGPU-Hot-Unplug&amp;data=05%7C01%7Cshuotaoxu%40microsoft.com%7Cf1f7980b198541d7196d08da17d79838%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637848512015144682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=26qOd5vKzOigo0SaSc5%2FF8BOI9yzRlqC08xUMC01Jzk%3D&amp;reserved=0) <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.phoronix.com%2Fscan.php%3Fpage%3Dnews_item%26px%3DLinux-5.14-AMDGPU-Hot-Unplug&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C86a376e9139548aab4ca08da17d9621f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637848519676249428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8GPGb%2B9bgMH3ZgbFUeChtP0hxOMRKwt7Q4it%2BEC%2Flfc%3D&reserved=0>


I am not sure about the code part since it all touches KFD driver (KFD
team can comment on that) - but I was just wondering if you try 5.14
kernel would things just work for you out of the box ?

Andrey

On 2022-04-05 22:45, Shuotao Xu wrote:
Dear AMD Colleagues,

We are from Microsoft Research, and are working on GPU disaggregation
technology.

We have created a new pull requestAdd PCIe hotplug support for amdgpu by
xushuotao · Pull Request #131 · RadeonOpenCompute/ROCK-Kernel-Driver
(github.com)
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver%2Fpull%2F131&amp;data=05%7C01%7Cshuotaoxu%40microsoft.com%7Cf1f7980b198541d7196d08da17d79838%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637848512015144682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=u2NtNDfuiCfKNKqeZ337KLq2uRDB1oGyO3%2BxIMQweRA%3D&amp;reserved=0
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver%2Fpull%2F131&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C86a376e9139548aab4ca08da17d9621f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637848519676249428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qjShnpesp%2F0P1qFSeAPjF2Oc5Dh1tfnUPy4EcLUxylU%3D&reserved=0>>in
ROCK-Kernel-Driver, which will enable PCIe hot-plug support for amdgpu.

We believe the support of hot-plug of GPU devices can open doors for
many advanced applications in data center in the next few years, and we
would like to have some reviewers on this PR so we can continue further
technical discussions around this feature.

Would you please help review this PR?

Thank you very much!

Best regards,

Shuotao Xu





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux