Re: NVidia, again

Alexandru Chiscan <lec@xxxxxxxxxxxxxxxxxxxx> · Wed, 26 Mar 2014 15:45:33 +0200

On 03/26/2014 03:40 PM, m.roth@xxxxxxxxx wrote:
> Johnny Hughes wrote:
>> On 03/26/2014 08:14 AM, m.roth@xxxxxxxxx wrote:
>>> Johnny Hughes wrote:
>>>> On 03/26/2014 07:01 AM, mark wrote:
>>>>> On 03/26/14 03:01, Johnny Hughes wrote:
>>>>>> On 03/25/2014 04:36 PM, m.roth@xxxxxxxxx wrote:
>>>>>>> Got a HBS (y'know, Honkin' Big Server, one o' them technical terms),
>>>>>>> a Dell 720 with two Tesla GPUs. I updated the o/s, 6.5, and I cannot
>>>>>>> get the GPUs recognized. As a last resort, I d/l NVidia's proprietary
>>>>>>> driver/installer, 325, and it builds fine... I've yum removed the
>>>>>>> kmod-nvidia I had on the system, nouveau is blacklisted, and when I
>>>>>>> reboot, lsmod shows me nvidia loaded, which modinfo tells me looks
>>>>>>> like the one I built.... but enum_gpu, which is from a CUDA group,
>>>>>>> builds... but can't enumerate the GPUs (how we wake them up for the
>>> users). I
>>>>>>> see the /dev/nvidia*, and they're a+r, a+w.... Oh, and selinux is
>>>>>>> permissive.
>>>>>>>
>>>>>>> Anyone got a clue? If I can't get this working, I'm going to have to
>>>>>>> downgrade the system several kernels.
>>>>>> Do you have an /etc/X11/xorg.conf file or something in
>>>>>> /etc/X11/xorg.conf.d/ that actually name nvidia and not nv as the
>>>>>> driver?
>>>>> Nope - nothing there.
>>>> When you run the ./NVIDIA<version> command to build the driver, one of
>>>> the last steps is to have it "automatically update your configuration
>>>> file" .. select yes for that and it should create an xorg.conf file
>>>> that
>>>> will use the nvidia driver.
>>> a) I didn't have that before - did kmod-nvidia handle loading the
>>> correct
>>> one *without* an
>>>      xorg.conf?
>>> b) Do you think it'll do the right thing - this *is* a headless server.
>>>
>>> And a general question: what *does* kmod-nvidia do - is it different
>>> than, say, setting up a flag, or a script to notice that you're booting
> a new
>>> kernel, and run the proprietary installer -a -s?
>> Are you connecting to the server to do X related things remotely ... and
>> therefore need NVIDIA drivers for that?
>>
> I think you missed that part of my original post: no X. This box has two
> Tesla GPUs, and my users are using them for heavy duty scientific
> computing.... And my problem is that neither their programs, nor the
> utility I use (I *think* it that it seems to be part of the CUDA toolkit -
> I didn't set that part up) can enumerate them... meaning that they can't
> see or use the GPUs.

Try to install CUDA Toolkit (https://developer.nvidia.com/cuda-downloads), see from their FAQ:
*Q: *Will the installer replace the driver currently installed on my system?
*A: *The installer will provide an option to install the included driver, and if selected, 
it will replace the driver currently on your system.

Lec

>
>> I'll let one of the elrepo guys explain their RPM.
> Fair 'nough. I just threw that out as a general question, not expecting
> that was yours to answer.
>
>         mark
>
> _______________________________________________
> CentOS mailing list
> CentOS@xxxxxxxxxx
> http://lists.centos.org/mailman/listinfo/centos

-- 
Lec

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos