Re: Comparison of 3 replication models on Pech OSD cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-07-30 01:02, vitalif@xxxxxxxxxx wrote:
Why object data mutations (write IOs to an object) can't be tracked using old-school bitmap?

That's what I call "write intent journaling" %) it's still a sort of
"journaling" because you first modify the bitmap, fsync it, then
proceed with the write itself.

Not quite. Journal has to be updated on *each* mutation (either data or
meta-data). With bitmap you can mark block as dirty once per say N seconds,
so if there are a lot of writes to that block you save a lot of fsyncs
and just write to the block directly. That increases amount of data to
be resynced (blocks still marked as dirty) in case of crash, but that
is minor.

Also, journal and actual data updates go through transactions to guarantee
atomicity (well, everything goes through transactions). You can't have
a record in the journal and no data updated, the opposite is also true:
you can't have data updated and no record in a journal. Bitmap relaxes
this restriction. When you have sequential transactions for the whole pg
(pg lock and friends) performance degrades.

Ceph/Pech architecture, however,
dictates that an RBD spans multiple OSDs. Because if it doesn't it
stops being Ceph and becomes Linstor. :) and there is a Linstor
already. :)

Linux does not become Windows, even both draw windows on the screen :)

in this case it seems that a journal is more convenient
than a plain bitmap... aaand this is precisely what PGlog is (pglog
isn't a journal with data, it only contains a list of updated
objects).

True, data updates are not covered by any journal, but on each write IO
you have to add a record to the pg journal, replicate journal update
with data update and atomically (through transaction) apply updates
locally. And these updates in the same pg (even to different objects)
are fully serialized, indeed very convenient :)

--
Roman
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux