Re: dm-cache issue

Zdenek Kabelac <zdenek.kabelac@xxxxxxxxx> · Wed, 16 Nov 2016 15:06:00 +0100

Dne 16.11.2016 v 14:45 Teodor Milkov napsal(a):
On 16.11.2016 11:24, Zdenek Kabelac wrote:
Dne 15.11.2016 v 13:38 Teodor Milkov napsal(a):
On 14.11.2016 17:34, Zdenek Kabelac wrote:
Dne 14.11.2016 v 16:02 Alexander Pashaliyski napsal(a):
The server is booting for hours, because of IO load. It seems is triggered
a flush from SSD disk (that is used for a cache device) to the raid
controllers (they are with slow SATA disks).
I have 10 cached logical volumes in *writethrough mode*, each with 2T of
data over 2 raid controllers. I use a single SSD disk for the cache.
The backup system is with lvm2-2.02.164-1 & kernel 4.4.30.

Do you have any ideas why such flush is triggered? In writethrough cache
mode
we shouldn't have dirty blocks in the cache.

Have you ensured there was proper shutdown ?
Cache needs to be properly deactivated - if it's just turned off,
all metadata are marked dirty.

Zdenek

Hi,

I'm seeing the same behavior described by Alexander. Even if we assume
something is wrong with my shutdown scripts, still how could dm-cache ever be
dirty in writethrough mode? What about the case where server crashes for
whatever reason (kernel bug, power outage, operator error etc.)? Waiting
several hours, or for sufficiently large cache even days for the system to
come back up is not practical.

I found this 2013 conversation, where Heinz Mauelshagen <heinzm redhat com>
states that "in writethrough mode the cache will always be coherent after a
crash": https://www.redhat.com/archives/dm-devel/2013-July/msg00117.html

I'm thinking for a way to --uncache and recreate cache devices on every boot,
which should be safe in writethrough mode and takes reasonable, and more
importantly – constant amount of time.

My first 'guess' in this reported case is - the disk I/O traffic seen is
related to the 'reload' of cached chunks from disk back to cache.

This will happen in the case, there has been unclean cache shutdown.

However what is unclean is - why it slows down boot by hours.
Is the cache too big??

Indeed, cache is quite big – a 800GB SSD, but I found experimentally that this
is the size where I get good cache hit ratios with my >10TB data volume.

Yep - that's the current trouble of existing  dm-cache target.
It's getting inefficient when maintaining more then 1 million
cache block entries - recent versions of lvm2 even do not allow
create such cache without enforcing it.
(so for 32k blocks it'  ~30G cache data size)

As to the 'reload' vs 'flush' – I think it is flushing, because iirc iostat
showed lots of SSD reading and HDD writing, but I'm not really sure and need
to confirm that.

So, are you saying that in case of unclean shutdown this 'reload' is inevitable?

Yes - clean shutdown is mandatory - otherwise cache can't know consitency
and has to refresh itself.  Other option would be probably to drop cache
and let it rebuild - but you lose already gained 'knowledge' this way.

Anyway AFAIK there is ongoing devel and up-streaming process for new cache 
target which will others couple shortcomings and should perform much
better.   lvm2 will supposedly handle transition to a new format in some way
later.

How much time it takes obviously depends on the SSD size/speed & HDD speed,
but with 800GB SSD it is reasonable to expect very long boot times.

Can you provide full logs from 'deactivation' and following activation?

Any hints as to how to collect "full logs from 'deactivation' and following
activation"? It happens early in the Debian boot process (I think udev does
the activation) and I'm not sure how to enable logging... should I tweak
/etc/lvm/lvm.conf?

All you need to collect is basically 'serial' console log from your
machine  - so if you have other box to trap serial console log - it's
the most easiest option.

But since you already said you use  ~30times bigger cache size then the size 
with 'reasonable' performance - I think it's already clear where is your
problem hidden.

Until new target will be deployed - please consider to use significantly 
smaller cache size so the number of cache chunks is not above 1 000 000.

Regards

Zdenek

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel