PostgreSQL Weekly News – August 01 2010

== PostgreSQL Weekly News – August 01 2010 ==

PostgreSQL 9.0 beta 4 is out. Test!
http://www.postgresql.org/developer/beta

Job::Machine 0.011, a reliable Perl Job Queue using PostgreSQL, released.
http://search.cpan.org/dist/Job-Machine/

pgAdmin 1.10.5, a GUI management tool for PostgreSQL, released.
http://www.pgadmin.org/download/

pg_sample 0.01, a utility for exporting a small, sample dataset from a
larger PostgreSQL database, released.
http://github.com/mla/pg_sample

== PostgreSQL Jobs for August ==

http://archives.postgresql.org/pgsql-jobs/2010-08/threads.php

== PostgreSQL Local ==

Reuven Lerner will be teaching a 5-day course in PostgreSQL August
1-5, 2010 at the Hi-Tech College in Herzliya, Israel.
http://www.hi-tech.co.il/college/default.asp?PageID=12&CourseNum=4288

FrOSCon 2010 will take place in St. Augustin, Germany on August 21-22,
2010. The German PostgreSQL User Group got his own devroom and is
looking for talks. More info:
http://andreas.scherbaum.la/blog/archives/711-FrOSCon-2010-PostgreSQL-devroom-Call-for-papers.html

The Call for Papers for West is open until September 5, 2010. Details at:
http://www.postgresqlconference.org/

== PostgreSQL in the News ==

Planet PostgreSQL: http://planet.postgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to david@fetter.org, German language
to pwn@pgug.de, Italian language to pwn@itpug.org. Spanish language
to pwn@arpug.com.ar.

== Reviews ==

KaiGai Kohei reviewed Robert Haas’s patch to add get_whatever_oid
functionality.
== Applied Patches ==

Peter Eisentraut committed:

– In pgsql/doc/src/sgml/xfunc.sgml, add table creation and population
to example. From John Gage.

– In pgsql/doc/src/sgml/perform.sgml, fix grammar. Backpatched to
8.1.

– In pgsql/contrib/xml2/README.xml2, spelling fix.

– Spelling fixes.

– In pgsql/src/bin/psql/common.c, show psql timing output even in
quiet mode. These two settings ought to be independent of each
other.

– In pgsql/contrib/chkpass/chkpass.sql.in, remove wildly obsolete type
attribute externallength from chkpass.

– Fix indentation of verbatim block elements. Block elements with
verbatim formatting (literallayout, programlisting, screen,
synopsis) should be aligned at column 0 independent of the
surrounding SGML, because whitespace is significant, and indenting
them creates erratic whitespace in the output. The CSS stylesheets
already take care of indenting the output. Assorted markup
improvements to go along with it.

– Translation updates for 9.0beta4.

Alvaro Herrera committed:

– In PL/Perl[U], ensure $_SHARED is declared in the main:: namespace,
per bugs #5570 and #5571. Author: Alex Hunsaker

Robert Haas committed:

– In postgresql.conf.sample, use on/off rather than true/false. We
used to be consistent about this, but my recent patch to add a
restart_after_crash GUC failed to follow the existing convention.
Report and patch from Fujii Masao.

– In pgsql/doc/src/sgml/plpgsql.sgml, fix typo in PL/pgsql code
example. Backpatch to 8.4. Marc Cousin. Review by Kevin Grittner.

– Add ssl_cipher() and ssl_version() functions to contrib/sslinfo.
Review by Dave Page.

– In pgsql/src/backend/access/heap/heapam.c, fix possible page
corruption by ALTER TABLE .. SET TABLESPACE. If a zeroed page is
present in the heap, ALTER TABLE .. SET TABLESPACE will set the LSN
and TLI while copying it, which is wrong, and heap_xlog_newpage()
will do the same thing during replay, so the corruption propagates
to any standby. Note, however, that the bug can’t be demonstrated
unless archiving is enabled, since in that case we skip WAL logging
altogether, and the LSN/TLI are not set. Back-patch to 8.0; prior
releases do not have tablespaces. Analysis and patch by Jeff Davis.
Adjustments for back-branches and minor wordsmithing by me.

– In pgsql/contrib/fuzzystrmatch/fuzzystrmatch.c, avoid using
text_to_cstring() in levenshtein functions. Operating directly on
the underlying varlena saves palloc and memcpy overhead, which
testing shows to be significant. Extracted from a larger patch by
Alexander Korotkov.

– Make details of the Numeric representation private to numeric.c.
Review by Tom Lane.

– Make psql distinguish between unique indices and unique constraints.
Josh Kupershmidt. Reviewing and kibitzing by Kevin Grittner and me.

Tom Lane committed:

– Rewrite the rbtree routines so that an RBNode is the first field of
the
struct representing a tree entry, rather than being a separately
allocated
piece of storage. This API is at least as clean as the old one (if
not
more so — there were some bizarre choices in there) and it permits
a
very substantial memory savings, on the order of 2X in ginbulk.c’s
usage.

– Fix an additional set of problems in GIN’s handling of lossy page
pointers. Although the key-combining code claimed to work correctly
if its input contained both lossy and exact pointers for a single
page in a single TID stream, in fact this did not work, and could
not work without pretty fundamental redesign. Modify keyGetItem so
that it will not return such a stream, by handling lossy-pointer
cases a bit more explicitly than we did before. Per followup
investigation of a gripe from Artur Dabrowski. An example of a
query that failed given his data set is select count(*) from
search_tab where (to_tsvector(‘german’, keywords ) @@
to_tsquery(‘german’, ‘ee:* | dd:*’)) and (to_tsvector(‘german’,
keywords ) @@ to_tsquery(‘german’, ‘aa:*’)); Back-patch to 8.4 where
the lossy pointer code was introduced.

– In pgsql/src/backend/tsearch/ts_selfuncs.c, add some knowledge about
prefix matches to tsmatchsel(). It’s not terribly bright, but it
beats assuming that a prefix match behaves identically to an exact
match, which is what the code was doing before :-(. Noted while
experimenting with Artur Dobrowski’s example.

– Fix ANALYZE’s ancient deficiency of not trying to collect stats for
expression indexes when the index column type (the opclass
opckeytype) is different from the expression’s datatype. When
coded, this limitation wasn’t worth worrying about because we had no
intelligence to speak of in stats collection for the datatypes used
by such opclasses. However, now that there’s non-toy estimation
capability for tsvector queries, it amounts to a bug that ANALYZE
fails to do this. The fix changes struct VacAttrStats, and
therefore constitutes an API break for custom typanalyze functions.
Therefore we can’t back-patch it into released branches, but it was
agreed that 9.0 isn’t yet frozen hard enough to make such a change
unacceptable. Ergo, back-patch to 9.0 but no further. The API
break had better be mentioned in 9.0 release notes.

– Back-patch fix for renaming asyncCommitLSN to asyncXactLSN. AIUI
this was supposed to go into 9.0 as well as HEAD.

– Fix an ancient typo that prevented the detection of conflicting
fields when interval input “invalid” was specified together with
other fields. Spotted by Neil Conway with the help of a clang
warning. Although this has been wrong since the interval code was
written more than 10 years ago, it doesn’t affect anything beyond
which error message you get for a wrong input, so not worth
back-patching very far.

– Tweak a couple of macros in the regex code to suppress compiler
warnings from “clang”. The VERR changes make an assignment
unconditional, which is probably easier to read/understand anyway,
and one can hardly argue that it’s worth shaving cycles off the case
of reporting another error when one has already been detected. The
INSIST change limits where that macro can be used, but not in a way
that creates a problem for any existing call.

– In pgsql/src/template/darwin, don’t try to force use of
-no-cpp-precomp on OS X. It’s been five years since Apple shipped a
compiler that needed this switch, and there’s increasing interest in
using other compilers that won’t accept the switch at all. Better
to let anybody who still needs the switch inject it via CPPFLAGS.
Per gripe from Neil Conway.

Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as
ginEntryInsert itself. These leaks resulted in the GIN index build
context continuing to bloat even after we’d filled it to
maintenance_work_mem and started to dump data out to the index. In
combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4.
Speed seems on par with 8.4 too, maybe even a bit faster, for a
non-pathological case in which HEAD was formerly slower. Back-patch
to 9.0 so we don’t have a performance regression from 8.4.

– Fix potential failure when hashing the output of a subplan that
produces a pass-by-reference datatype with a nontrivial projection
step. We were using the same memory context for the projection
operation as for the temporary context used by the hashtable
routines in execGrouping.c. However, the hashtable routines feel
free to reset their temp context at any time, which’d lead to
destroying input data that was still needed. Report and diagnosis
by Tao Ma. Back-patch to 8.1, where the problem was introduced by
the changes that allowed us to work with “virtual” tuples instead of
materializing intermediate tuple values everywhere. The earlier
code looks quite similar, but it doesn’t suffer the problem because
the data gets copied into another context as a result of having to
materialize ExecProject’s output tuple.

– In pgsql/src/backend/executor/nodeLockRows.c, fix oversight in new
EvalPlanQual logic: the second loop over the ExecRowMark list in
ExecLockRows() forgot to allow for the possibility that some of the
rowmarks are for child tables that aren’t relevant to the current
row. Per report from Kenichiro Tanaka.

– In pgsql/doc/src/sgml/installation.sgml, work around a documentation
toolchain problem by replacing the “AIX-fixlevels” table with a
<variablelist> carrying the same information. Previously the 9.0
documentation was failing to build as a US-size PDF file. It’s
quite obscure what the real problem is or why this avoids it, but we
need a hack now so we can build docs for beta4. In passing do a bit
of editing in the AIX installation docs, in particular remove a
long-obsolete claim that the regression tests are likely to fail.

– In pgsql/src/backend/commands/tablecmds.c, fix another longstanding
problem in copy_relation_data: it was blithely assuming that a local
char[] array would be aligned on at least a word boundary. There
are architectures on which that is pretty much guaranteed to NOT be
the case … and those arches also don’t like non-aligned memory
accesses, meaning that log_newpage() would crash if it ever got
invoked. Even on Intel-ish machines there’s a potential for a large
performance penalty from doing I/O to an inadequately aligned
buffer. So palloc it instead. Backpatch to 8.0 — 7.4 doesn’t
have this code.

– In pgsql/src/include/catalog/pg_proc.h, clean up some
inconsistencies in the volatility marking of various I/O related
functions. Per today’s discussion, we will henceforth assume that
datatype I/O functions are either stable or immutable, never
volatile. (This implies in particular that domain CHECK constraint
expressions shouldn’t be volatile, since domain_in executes them.)
In turn, functions that execute the I/O functions of arbitrary
datatypes should always be labeled stable. This affects the
labeling of array_to_string, which was unsafely marked immutable,
and record_in, record_out, record_recv, record_send, domain_in,
domain_recv, which were over-conservatively marked volatile. The
array I/O functions were already marked stable, which is correct per
this policy but would have been wrong if we maintained domain_in as
volatile. Back-patch to 9.0, along with an earlier fix to correctly
mark cash_in and cash_out as stable not immutable (since they depend
on lc_monetary). No catversion bump — the implications of this
are not currently severe enough to justify a forced initdb.

– In pgsql/doc/src/sgml/release-9.0.sgml, update release notes for 9.0
beta 4. Back-patch some changes that were made only in HEAD.

– Improved version of patch to protect pg_get_expr() against misuse:
look through join alias Vars to avoid breaking join queries, and
move the test to someplace where it will catch more possible ways of
calling a function. We still ought to throw away the whole thing in
favor of a data-type-based solution, but that’s not feasible in the
back branches. This needs to be back-patched further than 9.0, but
I don’t have time to do so today. Committing now so that the fix
gets into 9.0beta4.

– Rewrite the key-combination logic in GIN’s keyGetItem() and
scanGetItem() routines to make them behave better in the presence of
“lossy” index pointers. The previous coding was outright incorrect
for some cases, as recently reported by Artur Dabrowski: scanGetItem
would fail to return index entries in cases where one index key had
multiple exact pointers on the same page as another key had a lossy
pointer. Also, keyGetItem was extremely inefficient for cases where
a single index key generates multiple “entry” streams, such as an @@
operator with a multiple-clause tsquery. The presence of a lossy
page pointer in any one stream defeated its ability to use the
opclass consistentFn, resulting in probing many heap pages that
didn’t really need to be visited. In Artur’s example case, a query
like WHERE tsvector @@ to_tsquery(‘a & b’) was about 50X slower than
the theoretically equivalent WHERE tsvector @@ to_tsquery(‘a’) AND
tsvector @@ to_tsquery(‘b’) The way that I chose to fix this was to
have GIN call the consistentFn twice with both TRUE and FALSE values
for the in-doubt entry stream, returning a hit if either call
produces TRUE, but not if they both return FALSE. The code handles
this for the case of a single in-doubt entry stream, but punts
(falling back to the stupid behavior) if there’s more than one lossy
reference to the same page. The idea could be scaled up to deal
with multiple lossy references, but I think that would probably be
wasted complexity. At least to judge by Artur’s example, such cases
don’t occur often enough to be worth trying to optimize. Back-patch
to 8.4. 8.3 did not have lossy GIN index pointers, so not subject
to these problems.

– In pgsql/src/backend/tsearch/ts_selfuncs.c, tweak tsmatchsel() so
that it examines the structure of the tsquery whenever possible (ie,
whenever the tsquery is a constant), even when no statistics are
available for the tsvector. For example, foo @@ ‘a & b'::tsquery
can be expected to be more selective than foo @@ ‘a'::tsquery,
whether or not we know anything about foo. We use
DEFAULT_TS_MATCH_SEL as the assumed selectivity of individual query
terms when no stats are available, then combine the terms according
to the query’s AND/OR structure as usual. Per experimentation with
Artur Dabrowski’s example. (The fact that there are no stats
available in that example is a problem in itself, but nonetheless
tsmatchsel should be smarter about the case.) Back-patch to 8.4 to
keep all versions of tsmatchsel() in sync.

Simon Riggs committed:

– Add explicit regression tests for ALTER TABLE lock levels. Use this
to catch a couple of lock level assignments that slipped through
manual testing, per Peter Eisentraut.

– Rename asyncCommitLSN to asyncXactLSN to reflect changed role in
9.0. Transaction aborts now record their LSN to avoid corner case
behaviour in Streaming Replication/Hot Standby, hence change of name
of variables and functions. As pointed out by Fujii Masao.
Cosmetic changes only.

Marc Fournier committed:

– Tag for beta4.

== Rejected Patches (for now) ==

No one was disappointed this week :-)

== Pending Patches ==

Robert Haas sent in a patch to add ssl_version() and ssl_cipher() to
contrib/sslinfo.

KaiGai Kohei sent in two more patches implementing SECURITY LABEL.

Boxuan Zhai sent in two more revisions of the WIP patch for MERGE.

Alexander Korotkov sent in two more revisions of the patch to make
levenshtein multibyte-aware in contrib/fuzzystrmatch.

Fujii Masao sent in another revision of the patch to do synchronous
replication.

Simon Riggs sent in another patch to reduce the needed locking for
ALTER TABLE.

Florian Pflug sent in another revision of the patch to show individual
statement latencies in pgbench output.

Erik Rijkers sent in a patch to fix an oversight in the variadic
function docs.

Robert Haas sent in another revision of the patch to reduce the
on-disk size of NUMERICs.

Zoltan Boszormenyi sent in another WIP patch to add a lock_timeout
GUC.

Henk Enting sent in a patch to help with the intersection of CHECK
constraints and multiple inheritance.

Comments are closed.