Changeset 183

Show
Ignore:
Timestamp:
09/08/10 18:47:07 (4 years ago)
Author:
depesz
Message:

documentation for new option (not yet implemented) for omnipitr-restore

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • trunk/omnipitr/doc/omnipitr-restore.pod

    r170 r183  
    111111environment variable. 
    112112 
     113=item --error-pgcontroldata (-ep) 
     114 
     115Sets handler for errors with pgcontroldata. Possible options: 
     116 
     117=over 
     118 
     119=item * ignore - warn in logs, but nothing else to be done - after some time, 
     120recheck if pg_controldata works 
     121 
     122=item * hang - enter infinite loop, waiting for admin interaction, but not 
     123finishing recovery 
     124 
     125=item * break - breaks recovery, and returns error status to PostgreSQL 
     126(default) 
     127 
     128=back 
     129 
     130Please check L<ERRORS> section below for more details. 
     131 
    113132=back 
    114133 
     
    299318processing - to send xlog segments to some permanent storage place. 
    300319 
     320=head2 ERRORS 
     321 
     322=head3 pg_controldata 
     323 
     324Sometimes, for not yet known reason, pg_controldata fails (doesn't print 
     325anything, exits with status -1). 
     326 
     327In such situation, omnipitr-restore died too, with error, but it had one bad 
     328consequence: it made PostreSQL assume that it should stop recovery and go 
     329directly into standalone mode, and start accepting connections. 
     330 
     331Because this solution is not so good, there is now switch to change the 
     332behaviour in case of such error. 
     333 
     334It has 3 possible values. 
     335 
     336=head4 break 
     337 
     338"Break" means break recovery, and return to PostgreSQL error. This is default 
     339behaviour to make it backward compatible. 
     340 
     341=head4 ignore 
     342 
     343With this error handling, omnipitr-restore will simply ignore all errors from 
     344pg_controldata - i.e. it will simply print information about error to logs, but 
     345it will not change anythin - it will still try to work on next wal segments, and 
     346after 5 minutes - it will retry to get pg_controldata. 
     347 
     348This is the most transparent setting, but also kind of dangerous. It means that 
     349if there is non-permanent problem leading to pg_controldata failing not 100%, it 
     350might simply get overlooked - replication will work just fine. 
     351 
     352And this can mean that the real source of the error can propagate and do more 
     353harm. 
     354 
     355=head4 hang 
     356 
     357With "hang" error handling, in case of pg_controldata failure, omnipitr-restore 
     358will basically hang - i.e. enter infinite loop, not doing anything. 
     359 
     360Of course before entering infinite loop, it will log information about the 
     361problem. 
     362 
     363While it might seem like pointless, it has two benefits: 
     364 
     365=over 
     366 
     367=item * It will not cause slave server to become standalone 
     368 
     369=item * It is easily detactable, as long as you're checking size of wal archive 
     370directory, or wal replication lag, or any other metric about replication - as 
     371replication will be 100% stoppped. 
     372 
     373=back 
     374 
     375To recover from hanged recovery, you have to restart PostgreSQL, i.e. run this 
     376sequence of commands (or something similar depending on what you're using to 
     377start/stop your PostgreSQL): 
     378 
     379    pg_ctl -D /path/to/pgdata -m fast stop 
     380    pg_ctl -D /path/to/pgdata start 
     381 
     382Of course when next pg_controldata error will happen - it will hang again. 
     383 
    301384=head2 EXAMPLES 
    302385