[Reconnoiter-users] Limit postgresql memory usage during rollup job?

Ask Bjørn Hansen ask at develooper.com
Mon Jul 26 05:02:00 EDT 2010


On Jul 7, 2010, at 5:04, Theo Schlossnagle wrote:

Thanks -- that helped a little; I added the patch below to see what's going on and it looks like the loop never makes it down to where v_i gets incremented (where by "never" I mean "not yet").  Output from running the rollup is below, too.


  - ask

--- sql/sprocs/stratcon.rollup_metric_numeric.sql	(revision 1365)
+++ sql/sprocs/stratcon.rollup_metric_numeric.sql	(working copy)
@@ -19,6 +19,8 @@
     v_count         INTEGER;
 BEGIN
 
+    raise notice 'rollup starting %', in_roll;
+
     -- Get rollup config based on given name, and fail if its wrong name.
     SELECT * FROM metric_numeric_rollup_config WHERE rollup = in_roll INTO v_conf;
     IF NOT FOUND THEN
@@ -42,7 +44,8 @@
    v_current_whence := 'epoch'::timestamptz + '1 second'::INTERVAL * v_conf.seconds * floor(extract( epoch FROM now() ) / v_conf.seconds);
 
     LOOP
-        IF v_i > 12 THEN
+        IF v_i > 4 THEN
+            RAISE NOTICE 'unlocking %, %', v_i, v_taskid;
             perform pg_advisory_unlock(43191, v_taskid);
             RETURN 1;
         END IF;
@@ -59,7 +62,7 @@
             -- The unit has to be given in seconds, AND provided as v_temprec.seconds
             v_temprec.use_whence := 'epoch'::timestamptz + '1 second'::INTERVAL * v_temprec.seconds * floor(extract( epoch FROM v_temprec.use_whence ) / v_temprec.seconds);
 
-RAISE NOTICE '(%,%)',v_temprec.rollup, v_temprec.use_whence; 
+RAISE NOTICE '(%,%,%)',v_temprec.rollup, v_temprec.use_whence, v_i; 
             -- Poor mans UPSERT :)
             INSERT INTO metric_numeric_rollup_queue ("interval", whence)
                 SELECT v_temprec.rollup, v_temprec.use_whence



reconnoiter=> select stratcon.rollup_metric_numeric(rollup) from metric_numeric_rollup_config order by seconds asc;
NOTICE:  rollup starting 5m
NOTICE:  (20m,2010-03-23 11:00:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:15:00-07'))
NOTICE:  (20m,2010-03-23 11:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:20:00-07'))
NOTICE:  (20m,2010-03-23 11:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:25:00-07'))
NOTICE:  (20m,2010-03-23 11:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:30:00-07'))
NOTICE:  (20m,2010-03-23 11:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:35:00-07'))
NOTICE:  (20m,2010-03-23 11:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:40:00-07'))
NOTICE:  (20m,2010-03-23 11:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:45:00-07'))
NOTICE:  (20m,2010-03-23 11:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:50:00-07'))
NOTICE:  (20m,2010-03-23 11:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 11:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 11:55:00-07'))
NOTICE:  (20m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:00:00-07'))
NOTICE:  (20m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:05:00-07'))
NOTICE:  (20m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:10:00-07'))
NOTICE:  (20m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:15:00-07'))
NOTICE:  (20m,2010-03-23 12:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:20:00-07'))
NOTICE:  (20m,2010-03-23 12:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:00:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:25:00-07'))
NOTICE:  (20m,2010-03-23 12:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:30:00-07'))
NOTICE:  (20m,2010-03-23 12:20:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:35:00-07'))
NOTICE:  (20m,2010-03-23 12:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:40:00-07'))
NOTICE:  (20m,2010-03-23 12:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:45:00-07'))
NOTICE:  (20m,2010-03-23 12:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:50:00-07'))
NOTICE:  (20m,2010-03-23 12:40:00-07,<NULL>)
NOTICE:  (30m,2010-03-23 12:30:00-07,<NULL>)
NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:55:00-07'))


> http://labs.omniti.com/trac/reconnoiter/browser/trunk/sql/sprocs/stratcon.rollup_metric_numeric.sql#L45
> 
> There is a limit of 12 windows that it will rollup on each invocation.  It works like this:
> 
> BEGIN;
> for each rollup period (5m, 20m, 30m, 1h, 4h, 6h, 1d)
>  for each dirty window in period (limit 12)
>    rollup
> COMMIT;
> 
> After that it should disconnect, freeing the backend and thus releasing all the work memory.  Clearly something isn't working right if you have 6GB of usage.
> 
> On Jul 6, 2010, at 12:44 AM, Ask Bjørn Hansen wrote:
> 
>> 
>> On May 20, 2010, at 6:37, Theo Schlossnagle wrote:
>> 
>> Hi,
>> 
>> I just got a chance to look at this again over the last couple days with a box with more memory.   After processing 3-4 hours, the rollup script still makes the postmaster use several GBs of memory (and it slows down insanely for some reason, which effectively stops the memory usage at ~6GB).
>> 
>> Where's the logic that limits it to one hour?  It seems to not work for me; so I'd like to try to make it a shorter interval like you suggested.
>> 
>> 
>> - ask
>> 
>>> 
>>> PROCESS:
>>> As data enters the system it marks in a table that a given window is dirty... so if data for 08:28:14.452374 comes in, it will insert it into the archive table and mark the 08:25:00 five minute window as dirty. The rollup job looks are the dirty marks and rolls the data into the 5m rollups and marks the 30 minute window dirty (in this case 08:20:00), the 30m window is processed (reading 5m data to constitute it) marking the 6h window dirty (06:00:00)... and so on.  NOTE: I don't remember the actual rollup intervals off the top of my head.
>>> 
>>> The job that does this limits itself to 12 rollups (12 x 5m = 60m) or one our of data.  That is specifically so that if it fails you only rollback an hour of data rollups and not "all outstanding rollups").
>>> 
>>> You could change that number in the pl/pgsql function to a number greater than or equal to 2.  However, I think you should be able to tune your way out of that.  I asked a DBA here and they said to walk through https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>>> 
>>> On May 20, 2010, at 3:29 AM, Ask Bjørn Hansen wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Some months ago postgresql started using too much memory for the box it's running on (while we're testing it's on a small 2GB virtual machine) when running the rollup jobs.   I just now got a chance to have a proper look at what it's doing.  The rollup job is moving along, with an entry like the following for every 5 minutes of all the time it hasn't been running:
>>>> 
>>>> NOTICE:  v_sql was (SELECT * FROM stratcon.window_robust_derive('2010-03-23 12:30:00-07'))
>>>> 
>>>> Now the trouble is that postgresql is slowly eating up all the memory on the box.  After processing just three hours worth of data it's using about 1GB memory and steadily growing.   If I abort the rollup job, the transaction is aborted too and I get to start over as far as I can tell.
>>>> 
>>>> Is all this intended?  Is there a workaround?   Any advice on how to explain to postgresql that it'll just have to live with whatever memory is available?   All I could find was advice about adjusting shared_buffers to be lower, so I set it really low.  I think everything else are default settings we got from the postgresql.org supplied RPM:
>>>> 
>>>> 	http://tmp.askask.com/2010/05/postgresql.conf
>>>> 
>>>> (I don't care much about the performance; I just need it to not use more memory than is in the box -- at this rate it'd break even if pgsql had 32GB memory to play with if we get more than a couple days behind on doing the rollups).
>>>> 
>>>> 
>>>> - ask
>>>> _______







More information about the Reconnoiter-users mailing list