On Wed, Oct 11, 2017 at 1:10 PM, Samuel Soulard <samuel.soulard@xxxxxxxxx> wrote: > Hmmm, If you failover the identity of the LIO configuration including PGRs > (I believe they are files on disk), this would work no? Using an 2 ISCSI > gateways which have shared storage to store the LIO configuration and PGR > data. Are you referring to the Active Persist Through Power Loss (APTPL) support in LIO where it writes the PR metadata to "/var/target/pr/aptpl_<wwn>"? I suppose that would work for a Pacemaker failover if you had a shared file system mounted between all your gateways *and* the initiator requests APTPL mode(?). > Also, you said another "fails over to another port", do you mean a port on > another ISCSI gateway? I believe LIO with multiple target portal IP on the > same node for path redundancy works with PGRs. Yes, I was referring to the case with multiple active iSCSI gateways which doesn't currently distribute PGRs to all gateways in the group. > In my scenario, if my assumptions are correct, you would only have 1 ISCSI > gateway available through 2 target portal IP (for data path redundancy). If > this first ISCSI gateway fails, both target portal IP failover to the > standby node with the PGR data that is available on share stored. > > > Sam > > On Wed, Oct 11, 2017 at 12:52 PM, Jason Dillaman <jdillama@xxxxxxxxxx> > wrote: >> >> On Wed, Oct 11, 2017 at 12:31 PM, Samuel Soulard >> <samuel.soulard@xxxxxxxxx> wrote: >> > Hi to all, >> > >> > What if you're using an ISCSI gateway based on LIO and KRBD (that is, >> > RBD >> > block device mounted on the ISCSI gateway and published through LIO). >> > The >> > LIO target portal (virtual IP) would failover to another node. This >> > would >> > theoretically provide support for PGRs since LIO does support SPC-3. >> > Granted it is not distributed and limited to 1 single node throughput, >> > but >> > this would achieve high availability required by some environment. >> >> Yes, LIO technically supports PGR but it's not distributed to other >> nodes. If you have a pacemaker-initiated target failover to another >> node, the PGR state would be lost / missing after migration (unless I >> am missing something like a resource agent that attempts to preserve >> the PGRs). For initiator-initiated failover (e.g. a target is alive >> but the initiator cannot reach it), after it fails over to another >> port the PGR data won't be available. >> >> > Of course, multiple target portal would be awesome since available >> > throughput would be able to scale linearly, but since this isn't here >> > right >> > now, this would provide at least an alternative. >> >> It would definitely be great to go active/active but there are >> concerns of data-corrupting edge conditions when using MPIO since it >> relies on client-side failure timers that are not coordinated with the >> target. >> >> For example, if an initiator writes to sector X down path A and there >> is delay to the path A target (i.e. the target and initiator timeout >> timers are not in-sync), and MPIO fails over to path B, quickly >> performs the write to sector X and performs second write to sector X, >> there is a possibility that eventually path A will unblock and >> overwrite the new value in sector 1 with the old value. The safe way >> to handle that would require setting the initiator-side IO timeouts to >> such high values as to cause higher-level subsystems to mark the MPIO >> path as failed should a failure actually occur. >> >> The iSCSI MCS protocol would address these concerns since in theory >> path B could discover that the retried IO was actually a retry, but >> alas it's not available in the Linux Open-iSCSI nor ESX iSCSI >> initiators. >> >> > On Wed, Oct 11, 2017 at 12:26 PM, David Disseldorp <ddiss@xxxxxxx> >> > wrote: >> >> >> >> Hi Jason, >> >> >> >> Thanks for the detailed write-up... >> >> >> >> On Wed, 11 Oct 2017 08:57:46 -0400, Jason Dillaman wrote: >> >> >> >> > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López >> >> > <jorpilo@xxxxxxxxx> >> >> > wrote: >> >> > >> >> > > As far as I am able to understand there are 2 ways of setting iscsi >> >> > > for >> >> > > ceph >> >> > > >> >> > > 1- using kernel (lrbd) only able on SUSE, CentOS, fedora... >> >> > > >> >> > >> >> > The target_core_rbd approach is only utilized by SUSE (and its >> >> > derivatives >> >> > like PetaSAN) as far as I know. This was the initial approach for Red >> >> > Hat-derived kernels as well until the upstream kernel maintainers >> >> > indicated >> >> > that they really do not want a specialized target backend for just >> >> > krbd. >> >> > The next attempt was to re-use the existing target_core_iblock to >> >> > interface >> >> > with krbd via the kernel's block layer, but that hit similar upstream >> >> > walls >> >> > trying to get support for SCSI command passthrough to the block >> >> > layer. >> >> > >> >> > >> >> > > 2- using userspace (tcmu , ceph-iscsi-conf, ceph-iscsi-cli) >> >> > > >> >> > >> >> > The TCMU approach is what upstream and Red Hat-derived kernels will >> >> > support >> >> > going forward. >> >> >> >> SUSE is also in the process of migrating to the upstream tcmu approach, >> >> for the reasons that you gave in (1). >> >> >> >> ... >> >> >> >> > The TCMU approach also does not currently support SCSI persistent >> >> > reservation groups (needed for Windows clustering) because that >> >> > support >> >> > isn't available in the upstream kernel. The SUSE kernel has an >> >> > approach >> >> > that utilizes two round-trips to the OSDs for each IO to simulate PGR >> >> > support. Earlier this summer I believe SUSE started to look into how >> >> > to >> >> > get >> >> > generic PGR support merged into the upstream kernel using >> >> > corosync/dlm >> >> > to >> >> > synchronize the states between multiple nodes in the target. I am not >> >> > sure >> >> > of the current state of that work, but it would benefit all LIO >> >> > targets >> >> > when complete. >> >> >> >> Zhu Lingshan (cc'ed) worked on a prototype for tcmu PR support. IIUC, >> >> whether DLM or the underlying Ceph cluster gets used for PR state >> >> storage is still under consideration. >> >> >> >> Cheers, David >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> >> >> >> -- >> Jason > > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com