root/trunk/omnipitr/doc/omnipitr-restore.pod

Revision 223, 11.7 kB (checked in by depesz, 3 years ago)

Add a way in which you can use omnipitr-restore on 9.0 streaming replication slave (-sr option).

Line 
1 =head1 OmniPITR - omnipitr-restore
2
3 =head2 USAGE
4
5 /some/path/omnipitr/bin/omnipitr-restore [options] %f %p
6
7 Options:
8
9 =over
10
11 =item --data-dir (-D)
12
13 Where PostgreSQL datadir is located (path) (defaults to current working
14 directory).
15
16 =item --source (-s)
17
18 Where I<omnipitr-restore> can find wal segments to use.
19
20 Check L<Source specification> for more details.
21
22 =item --recovery-delay (-w)
23
24 Delay when recovering wal segments (in seconds).
25
26 This is primarily used to keep window of safety before I<DELETE * FROM
27 main_table> will be applied on slave database.
28
29 =item --finish-trigger (-f)
30
31 Name of file to watch for existence - if it exists, recovery process will stop,
32 and PostgreSQL slave will become fully functional.
33
34 Check L<Finishing recovery> section for more details.
35
36 =item --remove-unneeded (-r)
37
38 Makes I<omnipitr-restore> remove unneeded wal segments. These are B<not>
39 segments that were passed to Pg - I<omnipitr-restore> checks last redo segment
40 to make sure this is safe.
41
42 =item --remove-at-a-time (-rt)
43
44 When removing old segments, remove at most that many segments before re-entering
45 loop to check for signals and/or new segment availability for Postgres.
46
47 Defaults to 3.
48
49 =item --removal-pause-trigger (-p)
50
51 Name of file to watch for existence. If it exists - I<omnipitr-restore> will not
52 remove unneeded wal segments regardless of I<--removal-unneeded> option. This is
53 to provide a way to make backups on slave.
54
55 =item --pre-removal-processing (-h)
56
57 If given, argument will be treated as shell command to run when any segment will
58 be removed from archive.
59
60 If the hook will finish without errors - segment will be removed. If there will
61 be errors - removal procedure will be postponed, and after some time, it will be
62 retried.
63
64 There will be one extra parameter attached, which will be name of the segment
65 file to be processed (prepared in such a way that it will be relative to current
66 working directory).
67
68 Passed segment will always be uncompressed.
69
70 =item --temp-dir (-t)
71
72 Where to create temporary files (defaults to /tmp or I<$TMPDIR> environment
73 variable location). This is only used when using pre-removal-processing.
74
75 =item --log (-l)
76
77 Name of logfile (actually template, as it supports %% L<strftime(3)>
78 markers. Unfortunately due to the %x usage by PostgreSQL, We cannot use %%
79 macros directly. Instead - any occurence of ^ character in log dir will be first
80 changed to %, and later on passed to strftime.
81
82 =item --streaming-replication (-sr)
83
84 This option should be used if you're setting streaming replication slave. It
85 causes I<omnipitr-restore> to die as soon as it will be called for WAL segment
86 that is not in walarchive.
87
88 =item --pid-file
89
90 Name of file to use for pidfile. If it is specified, than only one copy of
91 I<omnipitr-restore> (with this pidfile) can run at the same time.
92
93 Trying to run second copy of I<omnipitr-restore> will result in an error.
94
95 =item --verbose (-v)
96
97 Log verbosely what is happening.
98
99 =item --gzip-path (-gp)
100
101 Full path to gzip program - in case you can't set proper PATH environment
102 variable.
103
104 =item --bzip2-path (-bp)
105
106 Full path to bzip2 program - in case you can't set proper PATH environment
107 variable.
108
109 =item --lzma-path (-lp)
110
111 Full path to lzma program - in case you can't set proper PATH environment
112 variable.
113
114 =item --pgcontroldata-path (-pp)
115
116 Full path to pg_controldata program - in case you can't set proper PATH
117 environment variable.
118
119 =item --error-pgcontroldata (-ep)
120
121 Sets handler for errors with pgcontroldata. Possible options:
122
123 =over
124
125 =item * ignore - warn in logs, but nothing else to be done - after some time,
126 recheck if pg_controldata works
127
128 =item * hang - enter infinite loop, waiting for admin interaction, but not
129 finishing recovery
130
131 =item * break - breaks recovery, and returns error status to PostgreSQL
132 (default)
133
134 =back
135
136 Please check L<ERRORS> section below for more details.
137
138 =back
139
140 =head2 DESCRIPTION
141
142 Call to I<omnipitr-restore> should be in I<restore_command> variable in
143 I<recovery.conf>.
144
145 Which options should be given depends only on installation, but generally you
146 will need at least:
147
148 =over
149
150 =item * --data-dir
151
152 PostgreSQL "%p" passed file path is relative to I<DATADIR>, so it is required to
153 know it.
154
155 =item * --log
156
157 to make sure that information is logged someplace about archiving progress
158
159 =item * --source
160
161 to specify where to load WAL segments from
162
163 =back
164
165 If you'll specify more than 1 destination, you will also need to specify
166 I<--state-dir>
167
168 Of couse you can provide many --dst-local or many --dst-remote or many mix of
169 these.
170
171 Generally omnipitr-restore will try to deliver WAL segment to all destinations,
172 and will fail if B<any> of them will not accept new segment.
173
174 Segments will be transferred to destinations in this order:
175
176 =over
177
178 =item 1. All B<local> destinations, in order provided in command line
179
180 =item 2. All B<remote> destinations, in order provided in command line
181
182 =back
183
184 In case any destination will fail, I<omnipitr-restore> will save state (which
185 destinations it delivered the file to) and return error to PostgreSQL - which
186 will cause PostgrerSQL to call I<omnipitr-restore> again for the same WAL
187 segment after some time.
188
189 State directory will be cleared after every successfull file send, so it should
190 stay small in size (expect 1 file of under 500 bytes).
191
192 When constructing command line to put in I<restore_command> PostgreSQL GUC,
193 please remember that while providing C<"%p" "%f"> will work, I<omnipitr-restore>
194 requires only "%p"
195
196 =head3 Source specification
197
198 If the wal segments are compressed you have to prefix source path with
199 compression type followed by '=' sign.
200
201 Allowed compression types:
202
203 =over
204
205 =item * gzip
206
207 Decompresses with gzip program, used file extension is .gz
208
209 =item * bzip2
210
211 Decompresses with bzip2 program, used file extension is .bz2
212
213 =item * lzma
214
215 Decompresses with lzma program, used file extension is .lzma
216
217 =back
218
219 If you want to pass any extra arguments to compression program, you can either:
220
221 =over
222
223 =item * make a wrapper
224
225 Write a program/script that will be named in the same way your actual
226 compression program is named, but adding some parameters to call
227
228 =item * use environment variables
229
230 All of supported compression programs use environment variables:
231
232 =over
233
234 =item * gzip - GZIP
235
236 =item * bzip2 - BZIP2
237
238 =item * lzma - XZ_OPT
239
240 =back
241
242 For details - please consult manual to your choosen compression tool.
243
244 =back
245
246 =head3 Finishing recovery
247
248 There are 2 ways I<omnipitr-restore> can finish recovery, and there are 2
249 separate ways to signal it that it should finish.
250
251 First, the finishing procedures:
252
253 =over
254
255 =item * smart
256
257 In this mode I<omnipitr-restore> will feed all available WAL segments to
258 PostgreSQL (without any I<--recovery-delay> induced delay), and then finish
259 restoration process.
260
261 =item * immediate
262
263 In this mode I<omnipitr-restore> will skip all pending WAL segments, and make
264 PostgreSQL finish recover immediately.
265
266 This can be useful in case of running really bad query (think: TRUNCATE users),
267 and wanting to prevent this change to be replicated to slave.
268
269 =back
270
271 Now. I<omnipitr-restore> can be signaled into finishing recovery in 2 ways, one
272 of which is optional.
273
274 =over
275
276 =item * trigger file
277
278 This one is optional. If you will use --finish-trigger switch,
279 I<omnipitr-restore> will look for this file, and if it exists - it will start
280 finishing.
281
282 If the file exists, and contains string "NOW" (without quotation characters, but
283 with optional new line character "\n"), I<omnipitr-restore> will enter
284 "immediate finish" procedure. If the content is different, or the file is empty
285 - it will proceed in smart finish mode.
286
287 After OmniPITR will finish recovery, and PostgreSQL will enter normal mode of
288 working, it's strongly advised to remove this file.
289
290 =item * system signal
291
292 This one works always, regardless of --finish-trigger switch. Generally you can
293 send system signals (kill) to I<omnipitr-restore> to make it go to finish
294 recovery procedure.
295
296 Only 1 signals are supported:
297
298 =over
299
300 =item * SIGUSR1
301
302 makes the finish I<immediate>
303
304 =back
305
306 It is currently not possible to forcing 'smart' finishing by signal, due to
307 the fact that L<omnipitr-restore> is restarted after every segment.
308
309 =back
310
311 =head3 Segment removal
312
313 omnipitr-restore will automatically remove segments that are no longer
314 necessary.
315
316 To make it happen, it will periodically run I<pg_controldata> program, and check
317 name of last segment required for redo.
318
319 If pre-removal-processing is defined, it will be called before actuall removal.
320
321 omnipitr-restore will remove segments chronologically - oldest segments first.
322
323 One useful idea for pre-removal-processing, is using L<omnipitr-archive> for
324 processing - to send xlog segments to some permanent storage place.
325
326 =head2 ERRORS
327
328 =head3 pg_controldata
329
330 Sometimes, for not yet known reason, pg_controldata fails (doesn't print
331 anything, exits with status -1).
332
333 In such situation, omnipitr-restore died too, with error, but it had one bad
334 consequence: it made PostreSQL assume that it should stop recovery and go
335 directly into standalone mode, and start accepting connections.
336
337 Because this solution is not so good, there is now switch to change the
338 behaviour in case of such error.
339
340 It has 3 possible values.
341
342 =head4 break
343
344 "Break" means break recovery, and return to PostgreSQL error. This is default
345 behaviour to make it backward compatible.
346
347 =head4 ignore
348
349 With this error handling, omnipitr-restore will simply ignore all errors from
350 pg_controldata - i.e. it will simply print information about error to logs, but
351 it will not change anythin - it will still try to work on next wal segments, and
352 after 5 minutes - it will retry to get pg_controldata.
353
354 This is the most transparent setting, but also kind of dangerous. It means that
355 if there is non-permanent problem leading to pg_controldata failing not 100%, it
356 might simply get overlooked - replication will work just fine.
357
358 And this can mean that the real source of the error can propagate and do more
359 harm.
360
361 =head4 hang
362
363 With "hang" error handling, in case of pg_controldata failure, omnipitr-restore
364 will basically hang - i.e. enter infinite loop, not doing anything.
365
366 Of course before entering infinite loop, it will log information about the
367 problem.
368
369 While it might seem like pointless, it has two benefits:
370
371 =over
372
373 =item * It will not cause slave server to become standalone
374
375 =item * It is easily detactable, as long as you're checking size of wal archive
376 directory, or wal replication lag, or any other metric about replication - as
377 replication will be 100% stoppped.
378
379 =back
380
381 To recover from hanged recovery, you have to restart PostgreSQL, i.e. run this
382 sequence of commands (or something similar depending on what you're using to
383 start/stop your PostgreSQL):
384
385     pg_ctl -D /path/to/pgdata -m fast stop
386     pg_ctl -D /path/to/pgdata start
387
388 Of course when next pg_controldata error will happen - it will hang again.
389
390 =head2 EXAMPLES
391
392 =head3 Minimal setup:
393
394     restore_command='/.../omnipitr-restore -l /var/log/omnipitr/restore.log -s /mnt/wal_restore/ %f %p'
395
396 =head3 Minimal setup, but with defined finish trigger and recovery delay (5
397 mintues):
398
399     restore_command='/.../omnipitr-restore -D /mnt/data/ -l /var/log/omnipitr/restore.log -s /mnt/wal_restore/ -w 300 -f /tmp/finish.trigger %f %p'
400
401 =head3 Setup as above, but with pause trigger defined for doing backups-on-slave and removing unneeded segments
402
403     restore_command='/.../omnipitr-restore -D /mnt/data/ -l /var/log/omnipitr/restore.log -s /mnt/wal_restore/ -w 300 -f /tmp/finish.trigger -r -p /tmp/pause.trigger %f %p'
404
405 =head3 Minimal setup, but with backing up segments to remote server:
406
407     restore_command='/.../omnipitr-restore -l /var/log/omnipitr/restore.log -s /mnt/wal_restore/ -h "/.../omnipitr-archive --force-data-dir -l /var/log/omnipitr/archive.log -dr bzip2=rsync://backup/postgres/xlogs/" %f %p'
408
409 =head2 COPYRIGHT
410
411 The OmniPITR project is Copyright (c) 2009-2010 OmniTI. All rights reserved.
412
Note: See TracBrowser for help on using the browser.