Re: Ceph-ISCSI

Samuel Soulard <samuel.soulard@xxxxxxxxx> · Wed, 11 Oct 2017 15:13:47 -0400

Ahh so, in this case, only Suse Enterprise Storage is able to provide ISCSI connections of MS Clusters if an HA is required be it Active/Standby, Active/Active or Active/Failover.  

On Wed, Oct 11, 2017 at 2:03 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
On Wed, Oct 11, 2017 at 1:10 PM, Samuel Soulard

<samuel.soulard@xxxxxxxxx> wrote:

> Hmmm, If you failover the identity of the LIO configuration including PGRs

> (I believe they are files on disk), this would work no?  Using an 2 ISCSI

> gateways which have shared storage to store the LIO configuration and PGR

> data.

Are you referring to the Active Persist Through Power Loss (APTPL)

support in LIO where it writes the PR metadata to

"/var/target/pr/aptpl_<wwn>"? I suppose that would work for a

Pacemaker failover if you had a shared file system mounted between all

your gateways *and* the initiator requests APTPL mode(?).

> Also, you said another "fails over to another port", do you mean a port on

> another ISCSI gateway?  I believe LIO with multiple target portal IP on the

> same node for path redundancy works with PGRs.

Yes, I was referring to the case with multiple active iSCSI gateways

which doesn't currently distribute PGRs to all gateways in the group.

> In my scenario, if my assumptions are correct, you would only have 1 ISCSI

> gateway available through 2 target portal IP (for data path redundancy).  If

> this first ISCSI gateway fails, both target portal IP failover to the

> standby node with the PGR data that is available on share stored.

>

>

> Sam

>

> On Wed, Oct 11, 2017 at 12:52 PM, Jason Dillaman <jdillama@xxxxxxxxxx>

> wrote:

>>

>> On Wed, Oct 11, 2017 at 12:31 PM, Samuel Soulard

>> <samuel.soulard@xxxxxxxxx> wrote:

>> > Hi to all,

>> >

>> > What if you're using an ISCSI gateway based on LIO and KRBD (that is,

>> > RBD

>> > block device mounted on the ISCSI gateway and published through LIO).

>> > The

>> > LIO target portal (virtual IP) would failover to another node.  This

>> > would

>> > theoretically provide support for PGRs since LIO does support SPC-3.

>> > Granted it is not distributed and limited to 1 single node throughput,

>> > but

>> > this would achieve high availability required by some environment.

>>

>> Yes, LIO technically supports PGR but it's not distributed to other

>> nodes. If you have a pacemaker-initiated target failover to another

>> node, the PGR state would be lost / missing after migration (unless I

>> am missing something like a resource agent that attempts to preserve

>> the PGRs). For initiator-initiated failover (e.g. a target is alive

>> but the initiator cannot reach it), after it fails over to another

>> port the PGR data won't be available.

>>

>> > Of course, multiple target portal would be awesome since available

>> > throughput would be able to scale linearly, but since this isn't here

>> > right

>> > now, this would provide at least an alternative.

>>

>> It would definitely be great to go active/active but there are

>> concerns of data-corrupting edge conditions when using MPIO since it

>> relies on client-side failure timers that are not coordinated with the

>> target.

>>

>> For example, if an initiator writes to sector X down path A and there

>> is delay to the path A target (i.e. the target and initiator timeout

>> timers are not in-sync), and MPIO fails over to path B, quickly

>> performs the write to sector X and performs second write to sector X,

>> there is a possibility that eventually path A will unblock and

>> overwrite the new value in sector 1 with the old value. The safe way

>> to handle that would require setting the initiator-side IO timeouts to

>> such high values as to cause higher-level subsystems to mark the MPIO

>> path as failed should a failure actually occur.

>>

>> The iSCSI MCS protocol would address these concerns since in theory

>> path B could discover that the retried IO was actually a retry, but

>> alas it's not available in the Linux Open-iSCSI nor ESX iSCSI

>> initiators.

>>

>> > On Wed, Oct 11, 2017 at 12:26 PM, David Disseldorp <ddiss@xxxxxxx>

>> > wrote:

>> >>

>> >> Hi Jason,

>> >>

>> >> Thanks for the detailed write-up...

>> >>

>> >> On Wed, 11 Oct 2017 08:57:46 -0400, Jason Dillaman wrote:

>> >>

>> >> > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López

>> >> > <jorpilo@xxxxxxxxx>

>> >> > wrote:

>> >> >

>> >> > > As far as I am able to understand there are 2 ways of setting iscsi

>> >> > > for

>> >> > > ceph

>> >> > >

>> >> > > 1- using kernel (lrbd) only able on SUSE, CentOS, fedora...

>> >> > >

>> >> >

>> >> > The target_core_rbd approach is only utilized by SUSE (and its

>> >> > derivatives

>> >> > like PetaSAN) as far as I know. This was the initial approach for Red

>> >> > Hat-derived kernels as well until the upstream kernel maintainers

>> >> > indicated

>> >> > that they really do not want a specialized target backend for just

>> >> > krbd.

>> >> > The next attempt was to re-use the existing target_core_iblock to

>> >> > interface

>> >> > with krbd via the kernel's block layer, but that hit similar upstream

>> >> > walls

>> >> > trying to get support for SCSI command passthrough to the block

>> >> > layer.

>> >> >

>> >> >

>> >> > > 2- using userspace (tcmu , ceph-iscsi-conf, ceph-iscsi-cli)

>> >> > >

>> >> >

>> >> > The TCMU approach is what upstream and Red Hat-derived kernels will

>> >> > support

>> >> > going forward.

>> >>

>> >> SUSE is also in the process of migrating to the upstream tcmu approach,

>> >> for the reasons that you gave in (1).

>> >>

>> >> ...

>> >>

>> >> > The TCMU approach also does not currently support SCSI persistent

>> >> > reservation groups (needed for Windows clustering) because that

>> >> > support

>> >> > isn't available in the upstream kernel. The SUSE kernel has an

>> >> > approach

>> >> > that utilizes two round-trips to the OSDs for each IO to simulate PGR

>> >> > support. Earlier this summer I believe SUSE started to look into how

>> >> > to

>> >> > get

>> >> > generic PGR support merged into the upstream kernel using

>> >> > corosync/dlm

>> >> > to

>> >> > synchronize the states between multiple nodes in the target. I am not

>> >> > sure

>> >> > of the current state of that work, but it would benefit all LIO

>> >> > targets

>> >> > when complete.

>> >>

>> >> Zhu Lingshan (cc'ed) worked on a prototype for tcmu PR support. IIUC,

>> >> whether DLM or the underlying Ceph cluster gets used for PR state

>> >> storage is still under consideration.

>> >>

>> >> Cheers, David

>> >> _______________________________________________

>> >> ceph-users mailing list

>> >> ceph-users@xxxxxxxxxxxxxx

>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>> >

>>

>>

>>

>> --

>> Jason

>

>

--

Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com