On Thu, Mar 2, 2023 at 2:34 AM Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx> wrote: > > > ----- Original Message ----- > > From: "Roger Heflin" <rogerheflin@xxxxxxxxx> > > To: "linux-lvm" <linux-lvm@xxxxxxxxxx> > > Cc: "Malin Bruland" <malin.bruland@xxxxx> > > Sent: Thursday, 2 March, 2023 01:51:08 > > Subject: Re: lvconvert --uncache takes hours > > > On Wed, Mar 1, 2023 at 4:50 PM Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx> wrote: > >> > >> Hi all > >> > >> Working with a friend's machine, it has lvmcache turned on with writeback. This > >> has worked well, but now it's uncaching and it takes *hours*. The amount of > >> cache was chosen to 100GB on an SSD not used for much else and the dataset that > >> is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system mainly > >> works with file serving, but also has some VMs that benefit from the caching > >> quite a bit. But then - I wonder - how can it spend hours emptying the cache > >> like this? Most write caching I know of last only seconds or perhaps in really > >> worst case scenarios, minutes. Since this is taking hours, it looks to me > >> something should have been flushed ages ago. > >> > >> Have I (or we) done something very stupid here or is this really how it's > >> supposed to work? > >> > >> Vennlig hilsen > >> > >> roy > > > > A spinning raid6 array is slow on writes (see raid6 write penalty). > > Because of that the array can only do about 100 write operattions/sec. > > About 100 writes/second per data drive, that is. md parallilses I/O well. > No. On writes you get 100 writes to the raid6 total. With reads you get 100 iops/disk. The writes by their very raid6 nature cannot be parallalized. Each write to md requires a lot of work. At min, you have to re-read the sector you are writing, read the parity you need to update, calculate the parity changes, and , adjust the parity and re-write any parities that you need to change. Your other option is you might be able to write an entire stripe, but that requires writes to all disks + parity calc + writes to parity. All options of writing data to raid5/6 breakdown to iops/disk == total write iops. The raid5/6 format requires the multiple reads and writes, and really makes it slow on writes. > > If the disk is doing other work then it only has the extra capacity so > > it could destage slower. > > The system was mostly idle. > > > A lot depends on how big each chunk is. The lvmcache indicates the > > smallest chunksize is 32k. > > > > 100G / 32k = 3 million, and at 100seeks/sec that comes to at least an hour. > > Those 100GB was on SSD, not spinning rust. Last I checked, that was the whole point with caching. You are de-staging the SSD cache to spinning disks. correct? The writes to spinning disks are slow. > > > Lvm bookkeeping has to also be written to the spinning disks I would > > think, so 2 hours if the array were idle. > > erm - why on earth would you do writes to hdd if you're caching it? Once the cache is gone all LVM should be on the spinning disks. > > > Throw in a 50% baseload on the disks and you get 4 hours. > > > > Hours is reasonable. > > As I said, the system was idle. > > Vennlig hilsen > _______________________________________________ linux-lvm mailing list linux-lvm@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/