Search Postgresql Archives

Search then Delete Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,
I am fairly new to practical databases, but I am trying out the c
interface to postgres and am wondering how to improve performance. I
am a researcher, and I am trying to perform a large parameter sweep.
Since this will involve a couple of thousand simulations, I have a
process that manages which simulations have been done, and which still
need to be done, so I can run it easily on a cluster.

So, I have a fairly simple schema of 4 tables.

-- na, ca,ks,kf,lk,iinj are the parameters for my simulation.
CREATE TABLE alljobs (
	id SERIAL,
	ca int,
	na int,
	lk int,
	ks int,
	kf int,
	iinj int,
	PRIMARY KEY(id)
);


CREATE TABLE assignedjobs (
	jobid int,
	nodeid varchar(100),
	assignedtime timestamp,
	PRIMARY KEY(jobid)
);


CREATE TABLE completedjobs (
	jobid int,
	PRIMARY KEY(jobid)
);

CREATE TABLE unassignedjobs(
	jobid int,
	PRIMARY KEY(jobid)
);



alljobs is initially populated, and contains all the simulations that
will ever be run
unassignedjobs contains the ids in alljobs that havent been run yet
assignedjobs contains the ids in alljobs that have been dispatched to
some cpu on the cluster and are currently simulating
completedjobs contains all the completed jobs.

So fairly simply, I have a daemon running on a machine, which accesses
this DB. Clients connect and request the details for say 1000
simulations, at which point the daemon takes 1000 entries from the
unassigned table and moves them to the assigned table. The once the
client is finished with those jobs, it signals this to the daemon,
which then move those jobs from 'assigned' to 'complete'.

So this is fairly simple to implement, but my problem is that it is very slow.


In particular, I have a 'select' waiting for network connections,
which then calls this function:

typedef vector<long> VectorLong;
VectorLong assignJobs(PGconn* pDB, int number, string nodename)
{
	char buffer[1000];
	sprintf(buffer,"SELECT jobid from unassignedjobs LIMIT  %d",number);
	PGresult* pRes = PQexec(pDB, buffer);

	printf("assigning jobs");
	
	//PGresult* pRes = PQexec(pDB, "SELECT * from alljobs LIMIT 100");
	//PGresult* pRes = PQexec(pDB, "SELECT * from alljobs");

	int nRes = PQntuples(pRes);
	printf("Results found: %d",nRes);

	VectorLong jobs;
	for(int i=0;i<nRes;i++)
	{	
		long id = atol( PQgetvalue(pRes,i,0) );
		cout << id << " ";
		jobs.push_back(id);
		
		sprintf(buffer, "DELETE FROM unassignedjobs WHERE jobid = %ld", id);
		PQexec(pDB, buffer);

		sprintf(buffer, "INSERT INTO assignedjobs (jobid,nodeid) VALUES
(%ld, %s)", id, nodename.c_str() );
		PQexec(pDB, buffer);
	}


	return jobs;
}

but it is painfully slow. I was wondering if there is a way to improve
this? I feel there should be since I already have a 'pointer' to the
rows I want to delete.


Any help would be greatly appreciated.

Many thanks


Mike Hull

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux