PostgreSQL Weekly News – February 12 2012
== PostgreSQL Weekly News – February 12 2012 ==
== PostgreSQL Product News ==
pgreplay 1.1.0, a log-file-based statement replayer for PostgreSQL, released.
Postgres Plus Advanced Server v9.1.2 is now Generally Available.
Scalr 2.5 supports PostgreSQL.
Version 1.23.0 of tail_n_mail, a Postgres log watcher program, released:
== PostgreSQL Jobs for February ==
== PostgreSQL Local ==
PGDay NYC will be held April 2, 2012 at Lighthouse International in
New York City.
PGCon 2012 will be held 17-18 May 2012, in Ottawa at the University of
Ottawa. It will be preceded by two days of tutorials on 15-16 May 2012.
PGDay France will be in Lyon on June 7, 2012.
== PostgreSQL in the News ==
Planet PostgreSQL: http://planet.postgresql.org/
PostgreSQL Weekly News is brought to you this week by David Fetter
Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to firstname.lastname@example.org, German language
to email@example.com, Italian language to firstname.lastname@example.org. Spanish language
== Reviews ==
== Applied Patches ==
Alvaro Herrera pushed:
– fe-misc.c depends on pg_config_paths.h. Declare this in Makefile to
avoid failures in parallel compiles. Author: Lionel Elie Mamane
Robert Haas pushed:
– Remove dead declaration.
– pg_dump: Reduce dependencies on global variables. Change various
places in the code that are referencing the global Archive object
g_fout to instead reference the Archive object fout which is already
being passed as a parameter. For parallel pg_dump to work, we’re
going to need multiple Archive(Handle) objects, so the real solution
here is to pass down the Archive object to everywhere that it needs
to go, but we might as well pick the low-hanging fruit first.
– pg_dump: Remove global Archive pointer. Instead, everything that
needs the Archive object now gets it as a parameter. This is
necessary infrastructure for parallel pg_dump, but is also amply
justified by the ugliness of the current code (though a lot more
than this is needed to fix that problem).
– pg_dump: Further reduce reliance on global variables. This is
another round of refactoring to make things simpler for parallel
pg_dump. pg_dump.c now issues SQL queries through the relevant
Archive object, rather than relying on the global variable g_conn.
This commit isn’t quite enough to get rid of g_conn entirely, but it
makes a big dent in its utilization and, along the way, manages to
be slightly less code than before.
– Add TIMING option to EXPLAIN, to allow eliminating of timing
overhead. Sometimes it may be useful to get actual row counts out
of EXPLAIN (ANALYZE) without paying the cost of timing every node
entry/exit. With this patch, you can say EXPLAIN (ANALYZE, TIMING
OFF) to get that. Tomas Vondra, reviewed by Eric Theise, with minor
doc changes by me.
– Add a transform function for numeric typmod coercisions. This
enables ALTER TABLE to skip table and index rebuilds when a column
is changed to an unconstrained numeric, or when the scale is
unchanged and the precision does not decrease. Noah Misch, with a
few stylistic changes and a fix for an OID collision by me.
– Add a transform function for varbit typmod coercisions. This
enables ALTER TABLE to skip table and index rebuilds when the new
type is unconstraint varbit, or when the allowable number of bits is
not decreasing. Noah Misch, with review and a fix for an OID
collision by me.
– Support fls(). The immediate impetus for this is that Noah Misch’s
patch to elide unnecessary table and index rebuilds when changing
typmod for temporal types uses it; and this is extracted from that
patch, with some further commentary by me. But it seems logically
separate from the remainder of the patch, so I’m committing it
separately; this is not the first time someone has wanted fls() in
the backend and probably won’t be the last. If we end up using this
in more performance-critical spots it may be worthwhile to add some
architecture-specific optimizations to our src/port version of fls()
– e.g. any x86 platform can implement this using the assembly
instruction BSRL. But performance won’t matter a bit for assessing
typmod changes, so I’m not worried about that right now.
– Fix typos pointed out by Noah Misch.
– Add transform functions for various temporal typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds in some
cases. Noah Misch, with trivial changes by me.
– Add a comment to AdjustIntervalForTypmod to reduce chance of future
bugs. It’s not entirely evident how the logic here relates to the
interval_transform function, so let’s clue people in that they need
to check that if the rules change.
– Improve interval_transform function to detect a few more cases.
Noah Misch, per a review comment from me.
– Attempt to fix MSVC builds and other fls-related breakage. Thanks
to Andrew Dunstan for bringing this to my attention.
Tom Lane pushed:
– Add locking around WAL-replay modification of shared-memory
variables. Originally, most of this code assumed that no Postgres
backends could be running concurrently with it, and so no locking
could be needed. That assumption fails in Hot Standby. While it’s
still true that Hot Standby backends should never change values like
nextXid, they can examine them, and consistency is important in some
cases such as when computing a snapshot. Therefore, prudence
requires that WAL replay code obtain the relevant locks when
modifying such variables, even though it can examine them without
taking a lock. We were following that coding rule in some places
but not all. This commit applies the coding rule uniformly to all
updates of ShmemVariableCache and MultiXactState fields; a search of
the replay routines did not find any other cases that seemed to be
at risk. In addition, this commit fixes a longstanding thinko in
replay of NEXTOID and checkpoint records: we tried to advance
nextOid only if it was behind the value in the WAL record, but the
comparison would draw the wrong conclusion if OID wraparound had
occurred since the previous value. Better to just unconditionally
assign the new value, since OID assignment shouldn’t be happening
during replay anyway. The additional locking seems to be more in
the nature of future-proofing than fixing any live bug, so I am not
going to back-patch it. The NEXTOID fix will be back-patched
– Avoid problems with OID wraparound during WAL replay. Fix a
longstanding thinko in replay of NEXTOID and checkpoint records: we
tried to advance nextOid only if it was behind the value in the WAL
record, but the comparison would draw the wrong conclusion if OID
wraparound had occurred since the previous value. Better to just
unconditionally assign the new value, since OID assignment shouldn’t
be happening during replay anyway. The consequences of a failure to
update nextOid would be pretty minimal, since we have long had the
code set up to obtain another OID and try again if the generated
value is already in use. But in the worst case there could be
significant performance glitches while such loops iterate through
many already-used OIDs before finding a free one. The odds of a
wraparound happening during WAL replay would be small in a
crash-recovery scenario, and the length of any ensuing
OID-assignment stall quite limited anyway. But neither of these
statements hold true for a replication slave that follows a WAL
stream for a long period; its behavior upon going live could be
almost unboundedly bad. Hence it seems worth back-patching this fix
into all supported branches. Already fixed in HEAD in commit
– Avoid throwing ERROR during WAL replay of DROP TABLESPACE. Although
we will not even issue an XLOG_TBLSPC_DROP WAL record unless removal
of the tablespace’s directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace
by Hot Standby backends, wrong directory permissions on a standby
server, etc etc. The original coding threw ERROR if replay failed
to remove the directories, but that is a serious overreaction.
Throwing an error aborts recovery, and worse means that manual
intervention will be needed to get the database to start again,
since otherwise the same error will recur on subsequent attempts to
replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of
disk space is wasted, so it hardly seems justified to throw an
error. Accordingly, arrange to report such failures as LOG messages
and keep going when a failure occurs during replay. Back-patch to
9.0 where Hot Standby was introduced. In principle such problems
can occur in earlier releases, but Hot Standby increases the odds of
trouble significantly. Given the lack of field reports of such
issues, I’m satisfied with patching back as far as the patch applies
– Fix postmaster to attempt restart after a hot-standby crash. The
postmaster was coded to treat any unexpected exit of the startup
process (i.e., the WAL replay process) as a catastrophic crash, and
not try to restart it. This was OK so long as the startup process
could not have any sibling postmaster children. However, if a
hot-standby backend crashes, we SIGQUIT the startup process along
with everything else, and the resulting exit is hardly “unexpected”.
Treating it as such meant we failed to restart a standby server
after any child crash at all, not only a crash of the WAL replay
process as intended. Adjust that. Back-patch to 9.0 where hot
standby was introduced.
– Mark some more I/O-conversion-invoking functions as stable not
volatile. When written, textanycat, anytextcat, quote_literal, and
quote_nullable were marked volatile, because they could invoke
arbitrary type-specific output functions as part of casting their
anyelement arguments to text. Since then, we have defined a project
policy that I/O functions must not be volatile, as per commit
aab353a60b95aadc00f81da0c6d99bde696c4b75. So these functions can
safely be downgraded to stable. Most of the time this makes no
difference since they’ll get inlined anyway, but as noted by Andrew
Dunstan, there are cases where the volatile marking prevents
optimizations that the planner does before function inlining. (I
think I might have overlooked these functions in the earlier commit
on the grounds that inlining would make it moot, but not so — tgl)
This change results in a change in the expected output of the json
regression tests, because the planner can now flatten a sub-select
that it failed to before. The old output is preferable, but getting
that back will require some as-yet-unfinished work on RowExpr
handling. Marti Raudsepp
– Support min/max index optimizations on boolean columns. Since
bool_and() is equivalent to min(), and bool_or() to max(), we might
as well let them be index-optimized in the same way. The practical
value of this is debatable at best, but it seems nearly cost-free to
enable it. Code-wise, we need only adjust the entries in
pg_aggregate. There is a measurable planning speed penalty for a
query involving one of these aggregates, but it is only a few
percent in simple cases, so that seems acceptable. Marti Raudsepp,
reviewed by Abhijit Menon-Sen
– Check misplaced window functions before checking aggregate/group by
sanity. If somebody puts a window function in WHERE, we should
complain about that in so many words. The previous coding tended to
complain about the window function’s arguments instead, which is
likely to be misleading to users who are unclear on the semantics of
window functions; as seen for example in bug #6440 from Matyas
Novak. Just another example of how “add new code at the end” is
frequently a bad heuristic.
– Fix up dumping conditions for extension configuration tables.
Various filters that were meant to prevent dumping of table data
were not being applied to extension config tables, notably
–exclude-table-data and –no-unlogged-table-data. We also would
bogusly try to dump data from views, sequences, or foreign tables,
should an extension try to claim they were config tables. Fix all
that, and refactor/redocument to try to make this a bit less
fragile. This reverts the implementation, though not the feature,
of commit 7b070e896ca835318c90b02c830a5c4844413b64, which had broken
config-table dumping altogether :-(. It is still the case that the
code will dump config-table data even if –schema is specified.
That behavior was intentional, as per the comments in
getExtensionMembership, so I think it requires some more discussion
before we change it.
– Throw error sooner for unlogged GiST indexes. Throwing an error
only after we’ve built the main index fork is pretty unfriendly when
the table already contains data. Per gripe from Jay Levitt.
– Add ORDER BY to a query to prevent occasional regression test
failures. Per buildfarm, we sometimes get row-ordering variations
in the output. This also makes this query look more like numerous
other ones in the same test file.
– Fix pg_dump for better handling of inherited columns. Revise
pg_dump’s handling of inherited columns, which was last looked at
seriously in 2001, to eliminate several misbehaviors associated with
inherited default expressions and NOT NULL flags. In particular
make sure that a column is printed in a child table’s CREATE TABLE
command if and only if it has attislocal = true; the former behavior
would sometimes cause a column to become marked attislocal when it
was not so marked in the source database. Also, stop relying on
textual comparison of default expressions to decide if they’re
inherited; instead, don’t use default-expression inheritance at all,
but just install the default explicitly at each level of the
hierarchy. This fixes the search-path-related misbehavior recently
exhibited by Chester Young, and also removes some dubious
assumptions about the order in which ALTER TABLE SET DEFAULT
commands would be executed. Back-patch to all supported branches.
– Fix brain fade in previous pg_dump patch. In pre-7.3 databases,
pg_attribute.attislocal doesn’t exist. The easiest way to make sure
the new inheritance logic behaves sanely is to assume it’s TRUE, not
FALSE. This will result in printing child columns even when they’re
not really needed. We could work harder at trying to reconstruct a
value for attislocal, but there is little evidence that anyone still
cares about dumping from such old versions, so just do the minimum
necessary to have a valid dump. I had this correct in the original
draft of the patch, but for some unaccountable reason decided it
wasn’t necessary to change the value. Testing against an old server
– Fix oversight in pg_dump’s handling of extension configuration
tables. If an extension has not been selected to be dumped (perhaps
because of a –schema or –table switch), the contents of its
configuration tables surely should not get dumped either. Per gripe
from Hubert Depesz Lubaczewski.
– Fix I/O-conversion-related memory leaks in plpgsql. Datatype I/O
functions are allowed to leak memory in CurrentMemoryContext, since
they are generally called in short-lived contexts. However, plpgsql
calls such functions for purposes of type conversion, and was
calling them in its procedure context. Therefore, any leaked memory
would not be recovered until the end of the plpgsql function. If
such a conversion was done within a loop, quite a bit of memory
could get consumed. Fix by calling such functions in the transient
“eval_econtext”, and adjust other logic to match. Back-patch to all
supported versions. Andres Freund, Jan Urbański, Tom Lane
Michael Meskes pushed:
– Allow the connection keyword array to carry all seven items in ecpglib.
Heikki Linnakangas pushed:
– When building with LWLOCK_STATS, initialize the stats in
LWLockWaitUntilFree. If LWLockWaitUntilFree was called before the
first LWLockAcquire call, you would either crash because of access
to uninitialized array or account the acquisition incorrectly.
LWLockConditionalAcquire doesn’t have this problem because it
doesn’t update the lwlock stats. In practice, this never happens
because there is no codepath where you would call
LWLockWaitUntilfree before LWLockAcquire after a new process is
launched. But that’s just accidental, there’s no guarantee that
that’s always going to be true in the future. Spotted by Jeff
– Fix typo in comment.
– Rename LWLockWaitUntilFree to LWLockAcquireOrWait.
LWLockAcquireOrWait makes it more clear that the lock is acquired if
– Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y.
These were added to kwlist.h as unreserved keywords in separate
patches, but authors forgot to add them to the corresponding list in
gram.y. Because of that, even though they were supposed to be
unreserved keywords, they could not be used as identifiers.
src/tools/check_keywords.pl is your friend.
Peter Eisentraut pushed:
– createuser: Disable prompting by default. Do not prompt when
options were not specified. Assume –no-createdb, –no-createrole,
–no-superuser by default. Also disable prompting for user name in
dropdb, unless –interactive was specified. reviewed by Josh
– pg_regress: Use target-specific variable instead of overriding make
rule. Use a target-specific variable to add to CPPFLAGS instead of
writing a custom .c -> .o rule. This will ensure that dependency
tracking is used when enabled.
– pg_dump: Add some const qualifiers
– psql: Support zero byte field and record separators. Add new psql
settings and command-line options to support setting the field and
record separators for unaligned output to a zero byte, for easier
interfacing with other shell tools. reviewed by Abhijit Menon-Sen
Bruce Momjian pushed:
– Add opensp as a requirement for building the docs on Debian —
tested on Debian Squeeze.
Magnus Hagander pushed:
– Have pg_receivexlog always send an invalid log position in status
messages. This prevents pg_basebackup and pg_receivexlog from
becoming a synchronous standby in case ‘write’ is used for
synchronous_commit. Fujii Masao
== Rejected Patches (for now) ==
No one was disappointed this week
== Pending Patches ==
Marco Nenciarini sent in another revision of the patch to allow the
elements of arrays to be enforced as foreign keys.
Fujii Masao sent in two revisions of a patch to fix incorrect handling
of the timeout in pg_receivexlog.
Marko Kreen sent in another revision of the patch to create a new
tuple storage in libpq and use same to make dblink more efficient in
Chetan Suttraway sent in another revision of the patch to implement
Fujii Masao sent in a patch to fix an issue where pg_basebackup -x
stream from the standby gets stuck.
Chetan Suttraway sent in a patch to prevent the specification of
conflicting transaction read/write options.
Euler Taveira de Oliveira sent in another revision of the patch to do
xlog location arithmetic.
Shigeru HANADA sent in two more revisions of the patch to add a
Marti Raudsepp sent in another revision of a patch to remove an
optimization barrier involving the volatility of text-any
Alvaro Herrera and Alex Hunsaker traded patches to fix an issue with
missing keywords in make.
Peter Eisentraut sent in a patch to fix some of the missing things in
psql’s SELECT tab completion.
Marti Raudsepp sent in a patch to make TRUNCATE more MVCC-safe.
Peter Geoghegan sent in another revision of the patch for fast-path
ordering, b-tree index creation time.
Alex Hunsaker sent in a patch to fix a bug in PL/Perl in databases
encoded as SQL_ASCII.
Shigeru HANADA sent in another revision of the patch to collect
statistics on CSV files attached via FDW.
Jean-Baptiste Quenot sent in a patch to fix an issue with PL/Python’s
handling of result metadata.
Kevin Grittner sent in a patch to ensure that if a GUC has a check
function, it is run on a RESET just like it is on a SET, to make sure
that the resulting value is valid to set within the context.
Andrew Dunstan sent in two revisions of a patch to fix a case where
auto_explain can produce invalid JSON.
Jeff Janes sent in a WIP patch to to set XLP_FIRST_IS_CONTRECORD, this
being part of the continuing effort to move more work outside
Vik Reykja sent in a patch to optimize referential integrity checks.
Comments are closed.