Re: ESX FC host connectivity issues

Dan Lane <dracodan@xxxxxxxxx> · Sun, 28 Feb 2016 14:13:14 -0500

>> Runs fine for about a day, then simply stops responding.  There is
>> absolutely nothing in the /var/log/messages when this happens, and the
>> service seems to still be running, but no servers can see the storage.
>> Is there anywhere else I can look at logs?  Is there a way to enable
>> more verbose logging?  Additionally, once this happens it is
>> impossible to stop the service, even running a kill -9 on the process
>> never succeeds, the only thing that can be done at this point is to
>> reboot the target server.
>> Oddly, my FC switch still sees the target server.
>> Here is what "ps aux | grep target" shows for the process after it
>> crashes and I try to stop it, note the "D"ie "uninterruptible sleep":
>> root     17055  0.0  0.0 214848 15444 ?        Ds   19:35   0:00
>> /usr/bin/python3 /usr/bin/targetctl clear
>>
>
> Not sure what you mean by 'crashing' here, but if your not using
> v4.5-rc4+ or target-pending/4.4-stable I prepared for you a month ago,
> then you going to keep hitting the same original bug.
>

Unfortunately I'm about to leave town for a few weeks, so I have very
little time to look at this.  That said, let's talk about this... I
built the latest kernel using linux-next as well as the torvalds build
git last night.  Here are the commands I used (in case you see any
problems).

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git remote add linux-next
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
cd linux
git remote add linux-next
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git fetch linux-next
git fetch --tags linux-next
cp /boot/config-4.3.4-300.fc23.x86_64 .config
make oldconfig
make -j8 bzImage; make -j8 modules; make -j8 modules_install; make -j8 install

This resulted in a functioning 4.5rc5+ kernel.  a matter of hours
later the storage once again disappeared from my ESXi hosts.  I
understand there may be things I need to tweak on my hosts, but should
those things cause LIO to stop responding from the target server?
It's back to acting the same exact way as before (with the
target-pending/4.4-stable from a month ago), I can't stop the service
or kill the process.

# uname -a
Linux dracofiler.home.lan 4.5.0-rc5+ #2 SMP Sat Feb 27 15:22:25 EST
2016 x86_64 x86_64 x86_64 GNU/Linux

# ps aux | grep target
root      2531  0.2  0.0 214848 15500 ?        Ds   11:48   0:00
/usr/bin/python3 /usr/bin/targetctl clear

This is what I mean by "crashes", targetctl hangs here and there is
absolutely nothing I've been able to do to get it to start responding
again other than rebooting.

Thanks
Dan
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html