Fwd:] RRP works "in totally different way then most of people expects"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Probably I sent my patch proposals to the wrong list, so I'll resent them (see attachments). I also added corresponding new patches (0002-0003) for resetting the ringid file. I think the program should not mess with umask, specifically not inside an "if" in a subroutine. I failed to get the sense of it, so I added a comment saying so.

Regards,
Ulrich

--- Begin Message ---
Answering my own questions:

Those ringid files are created by corosync_ring_id_store() from main.c, and the contents is the ring sequence number (binary). The routine is called (indrirectly) from memb_state_commit_enter() (totemsrp.c). That routine is called by memb_join_process() and message_handler_memb_commit_token().

An obvious question arises:

As the sequence number is only stored for the active ring, what happens if rings are switched (rrp_mode == passive)? Likewise it seems corosync is using the IP of the first available ring as representative on startup, it seems. Shouldn't the representative (and identifier) be stable (as per slide 4 of [AMMA00] Amir, Y.; Moser, L. E.; Melliar-Smith, P. M.; Agarwal, D. A.; Ciarfella, P.: The Totem Single-Ring Ordering and Membership Protocol)? I doubt that representative and ringid are persistent just for each ring individually...

The "rxw"-Permissions are the natural consequence of this code (corosync_ring_id_store()):
        fd = open (filename, O_WRONLY, 0777);
        if (fd == -1) {
                fd = open (filename, O_CREAT|O_RDWR, 0777);
        }

So here's my first patch proposal also.

Regards,
Ulrich

>>> "Ulrich Windl" <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> schrieb am 02.06.2015 um
16:07 in Nachricht <556DD4B1020000A10001AA01@xxxxxxxxxxxxxxxxxxxxxxxxx>:
>>>> Jan Friesse <jfriesse@xxxxxxxxxx> schrieb am 02.06.2015 um 12:57 in Nachricht
> <556D8C08.5060001@xxxxxxxxxx>:
> [...]
>> Last but not least is RRP. RRP itself works very well sadly it works in
>> totally different way then most of people expects.
> [...]
> 
> Probably because the documentation leaves to many questions unanswered...
> You can't tell what's wrong as long as you don't know how it should work.
> 
> Example: What's the purpose of /var/lib/corosync/ringid_* files, and why do 
> some hosts have /var/lib/corosync/ringid_127.0.0.1? The file content seems to 
> be some binary number, but the file is created with rwx permissions...
> 
> On systems with two rings (not on 127.*) I see all combinations from one to 
> three ringid files.
> 
> Regards,
> Ulrich
> 
> 
> 
> _______________________________________________
> Users mailing list: Users@xxxxxxxxxxxxxxx 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




>From 70fba1e1a547d94e0480aa24b92810a410e36a08 Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 8 Jun 2015 12:52:04 +0200
Subject: [PATCH] exec/main.c: Improve corosync_ring_id_store()

exec/main.c: Change file creation mode from 0777 to S_IRUSR|S_IWUSR
improving error messages in corosync_ring_id_store().
---
 exec/main.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/exec/main.c b/exec/main.c
index 2e4fd02..6c9e1b6 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -1043,15 +1043,14 @@ static void corosync_ring_id_store (
 	snprintf (filename, sizeof(filename), "%s/ringid_%s",
 		get_run_dir(), totemip_print (addr));
 
-	fd = open (filename, O_WRONLY, 0777);
+	fd = open (filename, O_WRONLY, 0);
 	if (fd == -1) {
-		fd = open (filename, O_CREAT|O_RDWR, 0777);
+		fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
 	}
 	if (fd == -1) {
 		LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR,
-			"Couldn't store new ring id %llx to stable storage",
-			memb_ring_id->seq);
-
+			      "Couldn't open %s: %s",
+			      filename, strerror(errno));
 		corosync_exit_error (AIS_DONE_STORE_RINGID);
 	}
 	log_printf (LOGSYS_LEVEL_DEBUG,
@@ -1060,8 +1059,8 @@ static void corosync_ring_id_store (
 	close (fd);
 	if (res != sizeof(memb_ring_id->seq)) {
 		LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR,
-			"Couldn't store new ring id %llx to stable storage",
-			memb_ring_id->seq);
+			      "Couldn't store new ring id %llx in %s: %s",
+			      memb_ring_id->seq, filename, strerror(errno));
 
 		corosync_exit_error (AIS_DONE_STORE_RINGID);
 	}
-- 
1.7.12.4


--- End Message ---
--- Begin Message ---
Hi!

Maybe we want "O_DSYNC" also:
--
diff --git a/exec/main.c b/exec/main.c
index 6c9e1b6..9be33e4 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -1043,9 +1043,9 @@ static void corosync_ring_id_store (
        snprintf (filename, sizeof(filename), "%s/ringid_%s",
                get_run_dir(), totemip_print (addr));

-       fd = open (filename, O_WRONLY, 0);
+       fd = open (filename, O_WRONLY|O_DSYNC, 0);
        if (fd == -1) {
-               fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
+               fd = open (filename, O_CREAT|O_RDWR|O_DSYNC, S_IRUSR|S_IWUSR);
        }
        if (fd == -1) {
                LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR,
--

Ulrich



--- End Message ---
>From 0fcf0cfacf5a60f3d84d2407fa917bd72b3b24ca Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Jun 2015 08:39:02 +0200
Subject: [PATCH 2/3] Fix open flags when creating ringid file

exec/main.c (corosync_ring_id_create_or_load): Use zero flags when
opening file O_RDONLY.  Use symbolic permissions when creating ring id
file, dropping useless execute bit.  Questin change of umask when creating
file.
---
 exec/main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/exec/main.c b/exec/main.c
index 6c9e1b6..9cc0351 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -996,7 +996,7 @@ static void corosync_ring_id_create_or_load (
 
 	snprintf (filename, sizeof(filename), "%s/ringid_%s",
 		get_run_dir(), totemip_print (addr));
-	fd = open (filename, O_RDONLY, 0700);
+	fd = open (filename, O_RDONLY, 0);
 	/*
 	 * If file can be opened and read, read the ring id
 	 */
@@ -1009,8 +1009,8 @@ static void corosync_ring_id_create_or_load (
 	 */
 	if ((fd == -1) || (res != sizeof (uint64_t))) {
 		memb_ring_id->seq = 0;
-		umask(0);
-		fd = open (filename, O_CREAT|O_RDWR, 0700);
+		umask(0);		/* why that? */
+		fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
 		if (fd != -1) {
 			res = write (fd, &memb_ring_id->seq, sizeof (uint64_t));
 			close (fd);
-- 
1.7.12.4

>From 5a29ecee27d7fa71d7871c3c8d907dad61572fca Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Jun 2015 08:44:57 +0200
Subject: [PATCH 3/3] exec/main.c: Improve comments and log messages

exec/main.c (corosync_ring_id_create_or_load): Improve comments.  Add log
message when ring id is reset to zero.  Change wording of error messages
when new ringid file could not be created.
---
 exec/main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/exec/main.c b/exec/main.c
index 9cc0351..8734507 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -1005,9 +1005,11 @@ static void corosync_ring_id_create_or_load (
 		close (fd);
 	}
 	/*
-	 * If file could not be opened or read, create a new ring id
+	 * If file could not be opened or read, reset ring id
 	 */
 	if ((fd == -1) || (res != sizeof (uint64_t))) {
+		log_printf (LOGSYS_LEVEL_NOTICE, "Resetting ring id in %s\n",
+			    filename);
 		memb_ring_id->seq = 0;
 		umask(0);		/* why that? */
 		fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
@@ -1016,13 +1018,15 @@ static void corosync_ring_id_create_or_load (
 			close (fd);
 			if (res == -1) {
 				LOGSYS_PERROR (errno, LOGSYS_LEVEL_ERROR,
-					"Couldn't write ringid file '%s'", filename);
+					"Couldn't write initial ringid file '%s'",
+					       filename);
 
 				corosync_exit_error (AIS_DONE_STORE_RINGID);
 			}
 		} else {
 			LOGSYS_PERROR (errno, LOGSYS_LEVEL_ERROR,
-				"Couldn't create ringid file '%s'", filename);
+				"Couldn't create ringid file '%s': %s",
+				       filename, strerror(errno));
 
 			corosync_exit_error (AIS_DONE_STORE_RINGID);
 		}
-- 
1.7.12.4

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux