PostgreSQL

PostgreSQL 16 What's New - Part 1

Dinesh Kumar Chemuduru

January 16, 2024

Have questions or want a demo?

We’re here to help! Click the button below and we’ll be in touch.

Introduction

PostgreSQL 16 has been released, introducing numerous major features that enhance the database engine's performance and offer new capabilities to developers and administrators. This blog post will explore these important updates.

In the first part, we'll focus on the developer-friendly features of PostgreSQL 16. The following part, set to be published in the coming weeks, will delve into the other major features.

Here, we discuss major enhancements, including the introduction of new SQL/JSON standards, improvements in parallel joins and incremental sorts, the creation of custom collation rules, and other developer-friendly features.

Developer experience

Effective Parallel Joins

In the latest release of PostgreSQL, version 16, one of the major enhancements introduced is the support for FULL OUTER JOINS in parallel hash joins. This is a significant improvement compared to the functionality in earlier versions of the database software.

In previous PostgreSQL versions, the task of performing hash joins was allocated to a single worker. This often led to bottlenecks and inefficiencies, particularly when dealing with large datasets. However, in this current PostgreSQL 16 version, the process of hash join is now distributed and parallelized.

Consider the below quick example, where we are trying to demonstrate the FULL OUTER JOIN, which fetches all the non-matching records from both joining tables.

Copied to clipboard!


db1=> SELECT COUNT(*) FROM goods;
 count
--------
 320000
(1 row)

As shown above, we have a total of 320000 records in the goods table. Now, let's simulate a FULL OUTER JOIN on the same table to fetch all the records twice using the following query.

Copied to clipboard!


db1=> SELECT COUNT(*) FROM goods s FULL OUTER JOIN goods g ON(s.product_name=g.product_name||'x');
 count
--------
 640000
(1 row)

Here, we're using the FULL OUTER JOIN on the same table. However, the join condition compares all the product names with product names plus an 'x' character. This method retrieves all the records twice.

Now, let’s see the execution plan of this query in PostgreSQL 16.

Copied to clipboard!


db1=> EXPLAIN SELECT COUNT(*) FROM goods s FULL OUTER JOIN goods g ON(s.product_name=g.product_name||'x');
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=16040.78..16040.79 rows=1 width=8)
   ->  Gather  (cost=16040.67..16040.78 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=15040.67..15040.68 rows=1 width=8)
               ->  Parallel Hash Full Join  (cost=6701.29..14570.08 rows=188235 width=0)
                     Hash Cond: ((g.product_name || 'x'::text) = s.product_name)
                     ->  Parallel Seq Scan on goods g  (cost=0.00..3612.35 rows=188235 width=6)
                     ->  Parallel Hash  (cost=3612.35..3612.35 rows=188235 width=6)
                           ->  Parallel Seq Scan on goods s  (cost=0.00..3612.35 rows=188235 width=6)
(9 rows)

From the above plan, parallel workers are launched to perform the hash join, introducing the node Parallel Hash Full Join in this release. If we were to execute the same query in previous versions, we wouldn't encounter the Parallel Hash Full Join. Instead, we would see a single Hash Full Join.

In version 16, parallelism extends to support the Parallel Hash Right Join too. Previous versions of PostgreSQL already supported Parallel Hash Left Join. This hash join parallelism significantly boosts performance, leading to improved response times when joining large result sets and retrieving unmatched records.

SQL/JSON

Previous versions of PostgreSQL provide extensive support for storing and handling JSON documents. They also offer a rich set of developer functions for navigating JSON paths and performing CRUD operations on the documents. However, the latest release of PostgreSQL now supports SQL/JSON standard functions. Other database engines, including Oracle and MySQL, support the SQL/JSON standards using functions such as json_array() and json_object(). Similarly, PostgreSQL 16 has introduced these JSON constructors that adhere to the SQL/JSON standards. These JSON constructor functions enable us to easily create JSON objects from other result sets.

Let’s begin with some simple examples, where we try to demonstrate constructing and validating the JSON objects. Now, let's create a JSON object using the json_object constructor.

Copied to clipboard!


db1=> SELECT json_object(product_name VALUE price) FROM goods LIMIT 3;
    json_object
--------------------
 {"Product #1" : 1}
 {"Product #2" : 2}
 {"Product #3" : null}
(3 rows)

Now, let's exclude products with a NULL price by specifying ABSENT ON NULL in json_object.

Copied to clipboard!


db1=> SELECT json_object(product_name VALUE price ABSENT ON NULL) FROM goods LIMIT 3;
    json_object
--------------------
 {"Product #1" : 1}
 {"Product #2" : 2}
 {}
(3 rows)

By using this constructor, we specify an inline validation to the keys like allow only UNIQUE keys and throw exception if there are any duplicate keys found. Consider the below example, where we are constructing the JSON with duplicate keys, which throws exception.

Copied to clipboard!


db1=> SELECT json_object('id':1, 'id':2 WITH UNIQUE KEYS);
ERROR:  duplicate JSON object key value: "id"

By using this json_object constructor, we can also created nested json_objects as like below.

Copied to clipboard!


db1=> SELECT json_object('id':1, 'nested':json_object('id':1));
            json_object
-----------------------------------
 {"id" : 1, "nested" : {"id" : 1}}
(1 row)

The above one is an example to just demonstrate the flexibility of creating nested json objects, by using this new constructors.

The json_array constructor which is introduced, like the json_object, helps to build a JSON array using the provided values.

Copied to clipboard!


db1=> SELECT json_array(json_object('id':1), json_object('id':2));
        json_array
--------------------------
 [{"id" : 1}, {"id" : 2}]
(1 row)

Here is an example where we construct a JSON array using JSON objects. We can also create a JSON array object with all possible type values and exclude NULL values from the list.

Copied to clipboard!


db1=> SELECT json_array('a', ARRAY[1,2,3], now(), json '{"id": 1}', null ABSENT ON null);
                          json_array
--------------------------------------------------------------
 ["a", [1,2,3], "2024-01-12T23:32:16.35455+00:00", {"id": 1}]
(1 row)

In addition to the constructors, PostgreSQL 16 also provides an IS JSON predicate. This predicate checks if a given value is a valid JSON entity. With IS JSON, we can confirm that the input is one of the following: VALUE, SCALAR, OBJECT, or ARRAY. Moreover, we can verify if the JSON has a UNIQUE key set or a non-`UNIQUE` key set.

Consider the following simple JSON document, where we would be performing a few set of validations.

Copied to clipboard!


db1=> SELECT ($$ {"name": "John Doe", "age": 30,"isMarried": true,"children": ["Jane", "Joe"]} $$::JSON IS JSON) as is_json;
 is_json
---------
 t
(1 row)

The above example demonstrates a basic validation where we verify whether the given JSON is valid or not. Since it's a valid object, the output is true.

Now, let's take a closer look at the internal objects, such as age, which is JSON VALUE type.

Copied to clipboard!


db1=> SELECT ($${"name": "John Doe", "age": 30,"isMarried": true,"children": ["Jane", "Joe"]} $$::JSON)->'age' IS JSON VALUE as is_json;
 is_json
---------
 t
(1 row)

Similarly, let’s evaluate the children object which is of type JSON ARRAY.

Copied to clipboard!


db1=> SELECT ($${"name": "John Doe", "age": 30,"isMarried": true,"children": ["Jane", "Joe"]} $$::JSON)->'children' IS JSON ARRAY as is_json;
 is_json
---------
 t
(1 row)

The IS JSON also validates the entire JSON object to determine whether it contains unique keys or not. Below is a quick example demonstrating this behavior.

Copied to clipboard!


db1=> SELECT $${"name": "John Doe", "age": 30,"isMarried": true,"children": ["Jane", "Joe"], "name": "James Doe"}$$::JSON IS JSON WITH UNIQUE KEYS as is_unique;
 is_unique
-----------
 f
(1 row)

In the above example, we have used the name key twice, which caused it to return the value false.

Incremental sort

In PostgreSQL 13, a feature called incremental sort was introduced. This approach to sorting data allows it to be sorted in batches or groups, instead of processing it all at once.

In PostgreSQL 16, this feature was further enhanced to support DISTINCT/Unique sort operations. This means that, when performing data aggregate operations in PostgreSQL 16, we can now get the benefit of having incremental sort.

In general, one query feature gets the major advantage of having the incremental sort. That is, the LIMIT of the data result. Consider the below general query, which gets the top 100 costly products from the goods table.

Copied to clipboard!


db1=> SET enable_incremental_sort TO off;
SET
db1=> EXPLAIN ANALYZE SELECT product_name, price FROM goods ORDER BY product_name DESC, price DESC LIMIT 100;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11803.51..11815.01 rows=100 width=9) (actual time=70.691..72.609 rows=100 loops=1)
   ->  Gather Merge  (cost=11803.51..33450.42 rows=188234 width=9) (actual time=70.690..72.596 rows=100 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Sort  (cost=10803.50..11274.09 rows=188234 width=9) (actual time=67.790..67.796 rows=80 loops=2)
               Sort Key: product_name DESC, price DESC
               Sort Method: top-N heapsort  Memory: 32kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 32kB
               ->  Parallel Seq Scan on goods  (cost=0.00..3609.34 rows=188234 width=9) (actual time=0.009..20.435 rows=160000 loops=2)
 Planning Time: 0.063 ms
 Execution Time: 72.634 ms
(11 rows)

As shown above, we explicitly disabled the enable_incremental_sort feature to illustrate the benefits of the incremental_sort. The query above performs a sequential scan on the table, then uses top-N heap sort to retrieve the top 100 costly products. This query takes approximately 72ms.

Now, let's execute the same query with increment_sort enabled and observe the response time.

Copied to clipboard!


db1=> SET enable_incremental_sort TO on;
SET
db1=> EXPLAIN ANALYZE SELECT product_name, price FROM goods ORDER BY product_name DESC, price DESC LIMIT 100;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2.57..10.59 rows=100 width=9) (actual time=0.072..0.129 rows=100 loops=1)
   ->  Incremental Sort  (cost=2.57..25672.45 rows=319997 width=9) (actual time=0.071..0.122 rows=100 loops=1)
         Sort Key: product_name DESC, price DESC
         Presorted Key: product_name
         Full-sort Groups: 4  Sort Methods: top-N heapsort, quicksort  Average Memory: 26kB  Peak Memory: 26kB
         ->  Index Scan Backward using product_idx on goods  (cost=0.42..13469.41 rows=319997 width=9) (actual time=0.022..0.083 rows=129 loops=1)
 Planning Time: 0.075 ms
 Execution Time: 0.145 ms
(8 rows)

From the results, it's evident that the query response time has significantly decreased from 72ms to 0.145ms.

This improvement is due to the index on the product_name column. This column serves as a Presorted Key for incremental sort, which only returned 149 rows to be sorted into 4 groups. This response is far more efficient than the previous query execution, which required a full table scan to retrieve all records before performing the top-N heap sort.

The incremental sort feature is incredibly useful when processing large amounts of data. It eliminates the need to load and sort the entire data set at once by allowing you to process the data in groups. However, in earlier versions of PostgreSQL, this feature was limited to only ORDER BY and WINDOW functions. Now, it has been expanded to support DISTINCT, where PostgreSQL internally performs a sort to eliminate duplicate values from the list.

Consider the following example, where we add the DISTINCT clause to the previous query. Notice how the incremental_sort operation is selected and it creates 100 incremental sort groups to eliminate duplicates from the result. If we increase the dataLIMIT from 100 to 1000then, we would be having 1000 groups in the incremental sorts.

Copied to clipboard!


-db1=> EXPLAIN ANALYZE SELECT DISTINCT on(product_name) * FROM goods ORDER BY product_name DESC, price DESC LIMIT 100;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2.57..267.67 rows=100 width=9) (actual time=0.247..2.452 rows=100 loops=1)
   ->  Unique  (cost=2.57..26472.44 rows=9985 width=9) (actual time=0.246..2.443 rows=100 loops=1)
         ->  Incremental Sort  (cost=2.57..25672.45 rows=319997 width=9) (actual time=0.246..2.184 rows=3169 loops=1)
               Sort Key: product_name DESC, price DESC
               Presorted Key: product_name
               Full-sort Groups: 100  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
               ->  Index Scan Backward using product_idx on goods  (cost=0.42..13469.41 rows=319997 width=9) (actual time=0.027..1.337 rows=3201 loops=1)
 Planning Time: 0.371 ms
 Execution Time: 2.529 ms
(9 rows)

Custom Collation Rules

With the release of PostgreSQL 16, a significant new feature is the ability to define custom collation rules. These rules can be used to customize the way data is sorted in database. This functionality is particularly useful in handling cases where the conventional sorting techniques do not suffice. For instance, consider a scenario where you have a set of words whose pronunciation order is not the same as their alphabetical order.

Traditional sorting methods would fail to consider this nuance. However, with the custom collation rules in PostgreSQL 16, you can impose a specific pronunciation-based sorting order.

This new feature adds another layer of flexibility and control in managing and manipulating your data. To demonstrate this feature, let's consider the following set of English words. Their alphabetical order differs from the order of their pronunciation.

The words "Hour", "Island", "Knot", "Pneumonia", "Psychology", and "Wrist" are alphabetically sorted. However, when we consider their pronunciation, which includes silent letters, a different letter might appear as the first character. If we wish to sort these words based on pronunciation, this is achievable in PostgreSQL 16 through custom collation rules. While previous versions required custom logic to handle such cases, PostgreSQL 16 allows for easy configuration.

Consider the below demonstrations which solves the above problem with collation custom rules.

Copied to clipboard!


db1=> CREATE TABLE words(name text);
CREATE TABLE

db1=> INSERT INTO words VALUES ('Hour'), ('Island'), ('Knot'), ('Pneumonia'), ('Psychology'), ('Wrist');
INSERT 0 6

db1=> SELECT * FROM words ORDER BY name;
    name
------------
 Hour
 Island
 Knot
 Pneumonia
 Psychology
 Wrist
(6 rows)

As seen in the previous output, the words are organized in ascending alphabetical order. If we want to sort these words based on pronunciation, we need to create a custom collation and configure the rules accordingly. Now, let’s create the collation with custom rules.

Copied to clipboard!


db1=> CREATE COLLATION pronounce (provider = icu, locale = 'en', rules = '& Is < Kn < Pn < Ho < Wr < Ps');
CREATE COLLATION

As seen above, in the rules section, we stated that Is is less than Ho. This is due to the general English pronunciation where Island is pronounced as Iland and Hour as our. According to alphabetical order, Iland is considered less than our. Now, let's run the same query with these custom collation rules and see whether the we are able to sort the words based on pronunciation.

From the output, we see that the list of words are now sorted based on pronunciation.

Copied to clipboard!


db1=> SELECT * FROM words ORDER BY name COLLATE pronounce;
    name
------------
 Island
 Knot
 Pneumonia
 Hour
 Wrist
 Psychology
(6 rows)

In addition to the major features mentioned above, PostgreSQL 16 offers a range of additional functionalities designed for developers. These improvements not only simplify the programming process but also offer more adaptable and efficient solutions to different programming challenges.

array_shuffle

This function serves an important purpose: it randomly shuffles the elements within an array. It does this in a completely arbitrary order, guaranteeing that the end result will be unpredictable and indeed, quite random. The beauty of this function lies in its versatility. It can accept any array data type, making it incredibly flexible and adaptable to various use cases.

Once the function has completed its shuffling process, it then returns a new array. This new array contains the same elements as the original one, but with the order of these elements thoroughly randomized. This simple yet powerful function is an essential tool when randomization is required.

Copied to clipboard!


db1=> SELECT array_shuffle(ARRAY[1,NULL,2]);
 array_shuffle
---------------
 {1,2,NULL}
(1 row)

db1=> SELECT array_shuffle(ARRAY[1,NULL,2]);
 array_shuffle
---------------
 {2,NULL,1}
(1 row)

From the above example, it's evident that the array_shuffle() function produces different results for the same input. Due to its versatility, we can use it to shuffle any array. For instance, consider the following array of arrays.

Copied to clipboard!


db1=> SELECT array_shuffle(ARRAY[ARRAY[1,2,3], ARRAY[4,5,6]]);
   array_shuffle
-------------------
 {{4,5,6},{1,2,3}}
(1 row)

db1=> SELECT array_shuffle(ARRAY[ARRAY[1,2,3], ARRAY[4,5,6]]);
   array_shuffle
-------------------
 {{1,2,3},{4,5,6}}
(1 row)

As demonstrated above, the array_shuffle() function generates a random array for the same input. This allows us to randomize the data group that we need to process in the application, if necessary.

array_sample

This particular function operates by randomly selecting a predetermined number of elements from an array. The function accepts two arguments. The first argument that the function accepts is an array. This array can be of any length, and it is from this array that the elements will be selected. The second argument accepted by the function is an integer. This integer represents the number of elements that the function will select from the array. Thus, for instance, if the function is given an array of ten elements and the integer 3, it will randomly select three elements from the array of ten. Consider the example below.

Copied to clipboard!


db1=> SELECT array_sample(ARRAY['head','tail',NULL], 1);
array_sample
-------------
{head}
(1 row)

db1=> SELECT array_sample(ARRAY['head','tail',NULL], 1);
array_sample
-------------
{tail}
(1 row)

As demonstrated above, the array_sample() function generates a different sample value for the same input. This allows us to randomly select a sample value from the array in our dataset.

Since the array_sample() function can accept any array element, we can also use it to randomly select an array from an array of arrays. Consider the following example.

Copied to clipboard!


db1=> SELECT array_sample( ARRAY[ARRAY['trail #1', 'head'], ARRAY['trail #2','tail']], 1);
array_sample
--------------------
{{"trail #1",head}}
(1 row)

db1=> SELECT array_sample( ARRAY[ARRAY['trail #1', 'head'], ARRAY['trail #2','tail']], 1);
array_sample
--------------------
{{"trail #2",tail}}
(1 row)

any_value

The any_value() in PostgreSQL 16 provides a handy utility by returning a single value from a specified group of values. This function doesn't guarantee to return a specific value, rather it arbitrarily picks any value from the group. This can be particularly beneficial in scenarios where you don't need a specific value, but any representative value from a group will suffice.

Consider the following example, which demonstrates this function's behaviour.

Copied to clipboard!


db1=> SELECT any_value(column1) FROM (VALUES (1), (6), (NULL));
 any_value
-----------
         1
(1 row)

From the above output, if you see any_value() returning a non-null value of 1, try to run the same process with different values as shown below.

Copied to clipboard!


db1=> SELECT any_value(column1) FROM (VALUES (NULL), (6), (NULL));
 any_value
-----------
         6
(1 row)

Now, let's execute the operation again with all NULL values and observe the results.

Copied to clipboard!


db1=> SELECT any_value(column1) FROM (VALUES (NULL), (NULL), (NULL));
 any_value
-----------

(1 row)

From the above results, it's evident that any_value() returns a non-null value from the group. This function is similar to the COALESCE function but operates on a group of values. It is particularly useful for replacing aggregate functions such as MIN, MAX, AVG when the application doesn't require the aggregated output value, but instead needs any non-null entry from that group.

Summary

PostgreSQL 16 introduces several new features and improvements. It now supports parallel hash right join, enhancing performance when joining large result sets. SQL/JSON standard functions are now supported, allowing easy creation of JSON objects from other result sets. Incremental sort, introduced in PostgreSQL 13, has been expanded to support DISTINCT/Unique sort operations, significantly reducing response times. Custom collation rules can be defined, offering flexibility in data sorting. New functions include array_shuffle for randomizing array elements, array_sample for randomly selecting array elements, and any_value for returning a single value from a group of values.

PostgreSQL 16 is now supported on Tessell.