On Wed, Feb 08, 2017 at 10:57:38AM PST, Shinobu Kinjo spake thusly: > If you would be able to reproduce the issue intentionally under > particular condition which I have no idea about at the moment, it > would be helpful. The issue is very reproduceable. It hangs every time. Any install I do with virt-install causes a hang at some point during the install. I have reproduces it 3 times this morning already. > There were some MLs previously regarding to *similar* issue. > > # google "libvirt rbd issue" I found: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004179.html which suggested file descriptors as the problem. That's good to know for when my cluster gets bigger but I have only 70 OSDs and the number of fds used did not exceed 90 when the soft limit is 1024. My problem also manifests itself a little differently than described in that post. I can dd large machine images into rbd all day long with no problems. In fact I am considering bypassing anaconda kickstart installs for the moment and just copying the machine image which gets successfully installed occasionally but this is not our normal deployment workflow so is not ideal. Plus I'm still concerned there is an actual underlying problem or something I am not understanding which may bite us later. That post also mentions jumbo frames. We have jumbo frames enabled everywhere. We did have a problem months ago with getting ceph up and running initially because we forgot to tell the switch to use jumbo frames and learned our lesson on that. Not sure what else I can look at. I'm not seeing any clues. -- Tracy Reed
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com