Re: RBD journaling benchmarks

"Maged Mokhtar" <mmokhtar@xxxxxxxxxxx> · Thu, 13 Jul 2017 17:58:42 +0300

--------------------------------------------------
From: "Jason 
Dillaman" <jdillama@xxxxxxxxxx>
Sent: Thursday, July 13, 2017 4:45 
AM
To: "Maged Mokhtar" <mmokhtar@xxxxxxxxxxx>
Cc: "Mohamad Gebai" 
<mgebai@xxxxxxxx>; "ceph-users" 
<ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  RBD journaling 
benchmarks

> On Mon, Jul 10, 2017 at 3:41 PM, Maged Mokhtar 
<mmokhtar@xxxxxxxxxxx> wrote:
>> On 2017-07-10 20:06, Mohamad 
Gebai wrote:
>>
>>
>> On 07/10/2017 01:51 PM, Jason 
Dillaman wrote:
>>
>> On Mon, Jul 10, 2017 at 1:39 PM, Maged 
Mokhtar <mmokhtar@xxxxxxxxxxx> wrote:
>>
>> These are 
significant differences, to the point where it may not make sense
>> to 
use rbd journaling / mirroring unless there is only 1 active 
client.
>>
>> I interpreted the results as the same RBD image 
was being concurrently
>> used by two fio jobs -- which we strongly 
recommend against since it
>> will result in the exclusive-lock 
ping-ponging back and forth between
>> the two clients / jobs. Each fio 
RBD job should utilize its own
>> backing image to avoid such a 
scenario.
>>
>>
>> That is correct. The single job 
runs are more representative of the
>> overhead of journaling only, and 
it is worth noting the (expected)
>> inefficiency of multiple clients 
for the same RBD image, as explained by
>> 
Jason.
>>
>> Mohamad
>>
>> Yes i expected a 
penalty but not as large. There are some use cases that
>> would 
benefit from concurrent access to the same block device, in vmware 
ad
>> hyper-v several hypervisors could share the same device which is 
formatted
>> via a clustered file system like MS CSV ( clustered shared 
volumes ) or
>> VMFS, which creates a volume/datastore that houses many 
VMs.
> 
> Both of these use-cases would first need support for 
active/active
> iSCSI. While A/A iSCSI via MPIO is trivial to enable, 
getting it to
> properly handle failure conditions without the possibility 
of data
> corruption is not since it relies heavily on arbitrary initiator 
and
> target-based timers. The only realistic and safe solution is to 
rely
> on an MCS-based active/active implementation.

The case also applies to active/passive iSCSI.. you 
still have many initiators/hypervisors writing concurrently to the same rbd 
image using a clustered file system (csv/vmfs).

>> I was wondering if such a setup could be supported in the 
future and maybe
>> there could be a way to minimize the overhead of 
the exclusive lock..for
>> example by having a distributed sequence 
number to the different active
>> client writers and have each writer 
maintain its own journal, i doubt that
>> the overhead will reach the 
values you showed.
> 
> The journal used by the librbd mirroring 
feature was designed to
> support multiple concurrent writers. Of course, 
that original design
> was more inline with the goal of supporting 
multiple images within a
> consistency group.

Yes but they will still suffer performance 
penalty , my understanding is that they 
would need the lock while writing the data to the journal entries 
and thus will be waiting turns, or  do they need the 
lock only for journal metadata like generating a 
sequence number ?     

>> 
Maged
>>
>>
> 
> -- 
> 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com