Hi Jake, I can’t remember the exact details, but it was something to do with a potential problem when using the pacemaker resource agents. I think it was to do with a potential hanging issue when one LUN on a shared target failed and then it tried to kill all the other LUNS to fail the target over to another host. This then leaves the TCM part of LIO locking the RBD which also can’t fail over. That said I did try multiple LUNS on one target as a test and didn’t experience any problems. I’m interested in the way you have your setup configured though. Are you saying you effectively have an active/active configuration with a path going to either host, or are you failing the iSCSI IP between hosts? If it’s the former, have you had any problems with scsi locking/reservations…etc between the two targets? I can see the advantage to that configuration as you reduce/eliminate a lot of the troubles I have had with resources failing over. Nick From: Jake Young [mailto:jak3kaj@xxxxxxxxx] Nick, Where did you read that having more than 1 LUN per target causes stability problems? I am running 4 LUNs per target. For HA I'm running two linux iscsi target servers that map the same 4 rbd images. The two targets have the same serial numbers, T10 address, etc. I copy the primary's config to the backup and change IPs. This way VMWare thinks they are different target IPs on the same host. This has worked very well for me. One suggestion I have is to try using rbd enabled tgt. The performance is equivalent to LIO, but I found it is much better at recovering from a cluster outage. I've had LIO lock up the kernel or simply not recognize that the rbd images are available; where tgt will eventually present the rbd images again. I have been slowly adding servers and am expanding my test setup to a production setup (nice thing about ceph). I now have 6 OSD hosts with 7 disks on each. I'm using the LSI Nytro cache raid controller, so I don't have a separate journal and have 40Gb networking. I plan to add another 6 OSD hosts in another rack in the next 6 months (and then another 6 next year). I'm doing 3x replication, so I want to end up with 3 racks. Jake
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com