Changeset 183

09/08/10 18:47:07 (5 years ago)

documentation for new option (not yet implemented) for omnipitr-restore



  • trunk/omnipitr/doc/omnipitr-restore.pod

    r170 r183  
    111111environment variable. 
     113=item --error-pgcontroldata (-ep) 
     115Sets handler for errors with pgcontroldata. Possible options: 
     119=item * ignore - warn in logs, but nothing else to be done - after some time, 
     120recheck if pg_controldata works 
     122=item * hang - enter infinite loop, waiting for admin interaction, but not 
     123finishing recovery 
     125=item * break - breaks recovery, and returns error status to PostgreSQL 
     130Please check L<ERRORS> section below for more details. 
    299318processing - to send xlog segments to some permanent storage place. 
     320=head2 ERRORS 
     322=head3 pg_controldata 
     324Sometimes, for not yet known reason, pg_controldata fails (doesn't print 
     325anything, exits with status -1). 
     327In such situation, omnipitr-restore died too, with error, but it had one bad 
     328consequence: it made PostreSQL assume that it should stop recovery and go 
     329directly into standalone mode, and start accepting connections. 
     331Because this solution is not so good, there is now switch to change the 
     332behaviour in case of such error. 
     334It has 3 possible values. 
     336=head4 break 
     338"Break" means break recovery, and return to PostgreSQL error. This is default 
     339behaviour to make it backward compatible. 
     341=head4 ignore 
     343With this error handling, omnipitr-restore will simply ignore all errors from 
     344pg_controldata - i.e. it will simply print information about error to logs, but 
     345it will not change anythin - it will still try to work on next wal segments, and 
     346after 5 minutes - it will retry to get pg_controldata. 
     348This is the most transparent setting, but also kind of dangerous. It means that 
     349if there is non-permanent problem leading to pg_controldata failing not 100%, it 
     350might simply get overlooked - replication will work just fine. 
     352And this can mean that the real source of the error can propagate and do more 
     355=head4 hang 
     357With "hang" error handling, in case of pg_controldata failure, omnipitr-restore 
     358will basically hang - i.e. enter infinite loop, not doing anything. 
     360Of course before entering infinite loop, it will log information about the 
     363While it might seem like pointless, it has two benefits: 
     367=item * It will not cause slave server to become standalone 
     369=item * It is easily detactable, as long as you're checking size of wal archive 
     370directory, or wal replication lag, or any other metric about replication - as 
     371replication will be 100% stoppped. 
     375To recover from hanged recovery, you have to restart PostgreSQL, i.e. run this 
     376sequence of commands (or something similar depending on what you're using to 
     377start/stop your PostgreSQL): 
     379    pg_ctl -D /path/to/pgdata -m fast stop 
     380    pg_ctl -D /path/to/pgdata start 
     382Of course when next pg_controldata error will happen - it will hang again. 
    301384=head2 EXAMPLES