Re: Ceph-ISCSI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hmmm, If you failover the identity of the LIO configuration including PGRs (I believe they are files on disk), this would work no?  Using an 2 ISCSI gateways which have shared storage to store the LIO configuration and PGR data. 

Also, you said another "fails over to another port", do you mean a port on another ISCSI gateway?  I believe LIO with multiple target portal IP on the same node for path redundancy works with PGRs. 

In my scenario, if my assumptions are correct, you would only have 1 ISCSI gateway available through 2 target portal IP (for data path redundancy).  If this first ISCSI gateway fails, both target portal IP failover to the standby node with the PGR data that is available on share stored. 


Sam

On Wed, Oct 11, 2017 at 12:52 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
On Wed, Oct 11, 2017 at 12:31 PM, Samuel Soulard
<samuel.soulard@xxxxxxxxx> wrote:
> Hi to all,
>
> What if you're using an ISCSI gateway based on LIO and KRBD (that is, RBD
> block device mounted on the ISCSI gateway and published through LIO).  The
> LIO target portal (virtual IP) would failover to another node.  This would
> theoretically provide support for PGRs since LIO does support SPC-3.
> Granted it is not distributed and limited to 1 single node throughput, but
> this would achieve high availability required by some environment.

Yes, LIO technically supports PGR but it's not distributed to other
nodes. If you have a pacemaker-initiated target failover to another
node, the PGR state would be lost / missing after migration (unless I
am missing something like a resource agent that attempts to preserve
the PGRs). For initiator-initiated failover (e.g. a target is alive
but the initiator cannot reach it), after it fails over to another
port the PGR data won't be available.

> Of course, multiple target portal would be awesome since available
> throughput would be able to scale linearly, but since this isn't here right
> now, this would provide at least an alternative.

It would definitely be great to go active/active but there are
concerns of data-corrupting edge conditions when using MPIO since it
relies on client-side failure timers that are not coordinated with the
target.

For example, if an initiator writes to sector X down path A and there
is delay to the path A target (i.e. the target and initiator timeout
timers are not in-sync), and MPIO fails over to path B, quickly
performs the write to sector X and performs second write to sector X,
there is a possibility that eventually path A will unblock and
overwrite the new value in sector 1 with the old value. The safe way
to handle that would require setting the initiator-side IO timeouts to
such high values as to cause higher-level subsystems to mark the MPIO
path as failed should a failure actually occur.

The iSCSI MCS protocol would address these concerns since in theory
path B could discover that the retried IO was actually a retry, but
alas it's not available in the Linux Open-iSCSI nor ESX iSCSI
initiators.

> On Wed, Oct 11, 2017 at 12:26 PM, David Disseldorp <ddiss@xxxxxxx> wrote:
>>
>> Hi Jason,
>>
>> Thanks for the detailed write-up...
>>
>> On Wed, 11 Oct 2017 08:57:46 -0400, Jason Dillaman wrote:
>>
>> > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López <jorpilo@xxxxxxxxx>
>> > wrote:
>> >
>> > > As far as I am able to understand there are 2 ways of setting iscsi
>> > > for
>> > > ceph
>> > >
>> > > 1- using kernel (lrbd) only able on SUSE, CentOS, fedora...
>> > >
>> >
>> > The target_core_rbd approach is only utilized by SUSE (and its
>> > derivatives
>> > like PetaSAN) as far as I know. This was the initial approach for Red
>> > Hat-derived kernels as well until the upstream kernel maintainers
>> > indicated
>> > that they really do not want a specialized target backend for just krbd.
>> > The next attempt was to re-use the existing target_core_iblock to
>> > interface
>> > with krbd via the kernel's block layer, but that hit similar upstream
>> > walls
>> > trying to get support for SCSI command passthrough to the block layer.
>> >
>> >
>> > > 2- using userspace (tcmu , ceph-iscsi-conf, ceph-iscsi-cli)
>> > >
>> >
>> > The TCMU approach is what upstream and Red Hat-derived kernels will
>> > support
>> > going forward.
>>
>> SUSE is also in the process of migrating to the upstream tcmu approach,
>> for the reasons that you gave in (1).
>>
>> ...
>>
>> > The TCMU approach also does not currently support SCSI persistent
>> > reservation groups (needed for Windows clustering) because that support
>> > isn't available in the upstream kernel. The SUSE kernel has an approach
>> > that utilizes two round-trips to the OSDs for each IO to simulate PGR
>> > support. Earlier this summer I believe SUSE started to look into how to
>> > get
>> > generic PGR support merged into the upstream kernel using corosync/dlm
>> > to
>> > synchronize the states between multiple nodes in the target. I am not
>> > sure
>> > of the current state of that work, but it would benefit all LIO targets
>> > when complete.
>>
>> Zhu Lingshan (cc'ed) worked on a prototype for tcmu PR support. IIUC,
>> whether DLM or the underlying Ceph cluster gets used for PR state
>> storage is still under consideration.
>>
>> Cheers, David
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



--
Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux