Re: iostat show constants write to osd disk with writeahead journal, normal behaviour ?

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Tue, 03 Jul 2012 06:30:42 +0200 (CEST)

Thanks,
I'll try that.

note: with btrfs, il can use filestore flusher = true + wip_flush_min git branch.
      and I see write to disk each X second.
      (2sec of seq write vs 30sec

with xfs,that doesn't work, filestore flusher = false + wip_flush,I see constant writes, without flusher in the logs.

----- Mail original ----- 

De: "Sage Weil" <sage@xxxxxxxxxxx> 
À: "Gregory Farnum" <greg@xxxxxxxxxxx> 
Cc: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>, "Mark Nelson" <mark.nelson@xxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx 
Envoyé: Lundi 2 Juillet 2012 23:02:13 
Objet: Re: iostat show constants write to osd disk with writeahead journal, normal behaviour ? 

On Mon, 2 Jul 2012, Gregory Farnum wrote: 
> On Tue, Jun 19, 2012 at 12:09 AM, Alexandre DERUMIER 
> <aderumier@xxxxxxxxx> wrote: 
> > Hi, more infos, I have active filestore debug = 20, min interval 29 and max interval 30. 
> > 
> > I see sync_entry each 30s, so it seem work as expected. 
> > 
> > cat ceph-osd.0.log |grep sync_entry 
> > 2012-06-19 07:56:00.084622 7fd09233b700 20 filestore(/srv/osd.0) sync_entry woke after 26.550294 
> > 2012-06-19 07:56:00.084641 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for another 2.449706 to reach min interval 29.000000 
> > 2012-06-19 07:56:02.534432 7fd09233b700 15 filestore(/srv/osd.0) sync_entry committing 18717 sync_epoch 5 
> > 2012-06-19 07:56:02.534481 7fd09233b700 15 filestore(/srv/osd.0) sync_entry doing a full sync (syncfs(2) if possible) 
> > 2012-06-19 07:56:02.963302 7fd09233b700 10 filestore(/srv/osd.0) sync_entry commit took 0.428878, interval was 29.428974 
> > 2012-06-19 07:56:02.963332 7fd09233b700 15 filestore(/srv/osd.0) sync_entry committed to op_seq 18717 
> > 2012-06-19 07:56:02.963341 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for max_interval 30.000000 
> > 2012-06-19 07:56:12.066002 7fd09233b700 20 filestore(/srv/osd.0) sync_entry woke after 9.102662 
> > 2012-06-19 07:56:12.066024 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for another 19.897338 to reach min interval 29.000000 
> > 2012-06-19 07:56:31.963460 7fd09233b700 15 filestore(/srv/osd.0) sync_entry committing 18935 sync_epoch 6 
> > 2012-06-19 07:56:31.963510 7fd09233b700 15 filestore(/srv/osd.0) sync_entry doing a full sync (syncfs(2) if possible) 
> > 2012-06-19 07:56:32.279737 7fd09233b700 10 filestore(/srv/osd.0) sync_entry commit took 0.316285, interval was 29.316396 
> > 2012-06-19 07:56:32.279778 7fd09233b700 15 filestore(/srv/osd.0) sync_entry committed to op_seq 18935 
> > 2012-06-19 07:56:32.279786 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for max_interval 30.000000 
> > 2012-06-19 07:56:44.837731 7fd09233b700 20 filestore(/srv/osd.0) sync_entry woke after 12.557945 
> > 2012-06-19 07:56:44.837757 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for another 16.442055 to reach min interval 29.000000 
> > 2012-06-19 07:57:01.279894 7fd09233b700 15 filestore(/srv/osd.0) sync_entry committing 19125 sync_epoch 7 
> > 2012-06-19 07:57:01.279939 7fd09233b700 15 filestore(/srv/osd.0) sync_entry doing a full sync (syncfs(2) if possible) 
> > 2012-06-19 07:57:01.558240 7fd09233b700 10 filestore(/srv/osd.0) sync_entry commit took 0.278354, interval was 29.278455 
> > 2012-06-19 07:57:01.558282 7fd09233b700 15 filestore(/srv/osd.0) sync_entry committed to op_seq 19125 
> > 2012-06-19 07:57:01.558291 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for max_interval 30.000000 
> > 2012-06-19 07:57:31.558394 7fd09233b700 20 filestore(/srv/osd.0) sync_entry woke after 30.000104 
> > 2012-06-19 07:57:31.558414 7fd09233b700 20 filestore(/srv/osd.0) sync_entry waiting for max_interval 30.000000 
> > 
> > 
> > But during all the time of the bench, I have flusher_entry logs. 
> > What is exactly flush_entry vs sync_entry? 
> 
> flush_entry is doing sync_file_range, which sends the data to the disk 
> but doesn't flush disk caches, or do a whole host of other things that 
> are needed to maintain integrity. The idea is that we don't want the 
> filesystem to store up thirty seconds worth of writes and then sync 
> them out on command, but rather to continuously do writes to disk. 
> sync_file_range is the best tool we have for accomplishing that. 
> 
> However, you could try turning it off and seeing if your performance 
> improves. :) Set filestore_sync_flush and filestore_flusher to false 
> in your config file. 

One alternative is to let the kernel do this. If you adjust the VM 
tunables with something like 

echo 10000000 > /proc/sys/vm/dirty_background_bytes 

and set 

filestore flusher = false 

that will rely on the kernel to keep the amount of dirty data small. This 
affects the entire host, though, so keep in mind that it will affect other 
processes and all mounted file systems. Conceptually, this is what we are 
trying to accomplish with the flushing stuff, but it's hard to do from a 
user process. Maybe the new cgroups stuff will make some of this 
easier... I haven't been following it very closely. 

sage 

-- 

-- 

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 

Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 

45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html