Connection Pooling in Pgpool-II
Pgpool-II is a cluster management tool for PostgreSQL that can cache connections to PostgreSQL servers. This blog introduces Pgpool-II connection pooling feature and shows how to configure connection pooling in Pgpool-II.
What is connection pooling?
Establishing and maintaining Database connections are expensive. The reason is that you have to establish a network connection, perform authentication and so on. Database connection pooling is a cache of database connections to keep database connections open so that the connection can be reused when a connection is required by a future request.
Reusing an active connection rather than establishing a new connection each time a connection is requested can improve performance and save system resources.
How does connection pooling work in Pgpool-II?
Pgpool-II caches established connections to the PostgreSQL servers and reuses them whenever a new request with the same properties (i.e. user name, database, protocol version, and other connection parameters if any) comes in.
First, let me explain how connection pooling works in Pgpool-II.- At startup, Pgpool-II parent process preforks num_init_children child processes, and each child process can cache connections up to the configured value of max_pool.
- Pgpool-II waits for connection requests from clients.
- One Pgpool-II child process receives connection request from a client.
- This Pgpool-II child process looks for existing connections which has requested database/user pair in the pool up to max_pool.
- If found, reuse it.
- If not found, the child process opens new connections to PostgreSQL servers and registers it to the pool. If the pool has no empty slot, Pgpool-II closes the oldest connection to PostgreSQL and reuses the slot.
- Do some queries processing until the client sends session close request.
- Pgpool-II closes the connection to client but keeps the connection to PostgreSQL servers for future use.
- Go to #1
Sometimes users ask how to configure Pgpool-II's num_init_children (the number of preforked Pgpool-II child processes) and PostgreSQL's max_connections. Note that Pgpool-II may establish num_init_children * max_pool connections to each PostgreSQL server.
And PostgreSQL allows concurrent connections for non-superusers up to max_connections - superuser_reserved_connections. In addition, canceling a query creates another connection to PostgreSQL.
Default is 0.
Here are some useful configurations of the above parameters.
- If a memory leak is detected in Pgpool-II, you can configure child_life_time parameter to prevent Pgpool-II child processes memory leak. Default value is 300 seconds. If a child process remains idle for more than the time specified in child_life_time, Pgpool-II will terminate the child process and spawn a new child process immediately.
- To prevent the Pgpool-II child processes from being occupied for long time by a lazy clients or broken TCP/IP connection between client and Pgpool-II, you can set client_idle_limit to a non-zero value. Default value is 0.
- To configure lifetime time of idle connections:
minEvictableIdleTimeMillis="60000" (default)
- To validate connections:
testWhileIdle="true"
validationQuery="SELECT 1"
Verify Pgpool-II's connection pooling
Let's verify Pgpool-II's connection pooling.Edit pgpool.conf to enable connection pooling.
db1=> \q
In the result of "SHOW POOL_POOLS"
below, each row represents one connection.
As you can see in the result, the connections (db1/user1) are cached even if client has closed the connection. About "pool_connected", 0 represents this process is waiting for connection and 1 represents it is being used by frontend. "pool_connected" is useful for checking the number of connections from frontend (Formatted output is needed).
$ psql -h /tmp -p 11000 -U pengbo postgres
postgres=# SHOW POOL_POOLS;
pool_pid |...| pool_id | backend_id | database | username | create_time |...| pool_counter | pool_backendpid | pool_connected
----------+---+---------+------------+----------+----------+---------------------+---+--------------+-----------------+----------------
11973 |...| 0 | 0 | postgres | pengbo | 2020-07-29 22:49:13 |...| 1 | 11987 | 1
11973 |...| 0 | 1 | postgres | pengbo | 2020-07-29 22:49:13 |...| 1 | 11988 | 1
11973 |...| 1 | 0 | | | |...| 0 | 0 | 0
11973 |...| 1 | 1 | | | |...| 0 | 0 | 0
11973 |...| 2 | 0 | | | |...| 0 | 0 | 0
11973 |...| 2 | 1 | | | |...| 0 | 0 | 0
11974 |...| 0 | 0 | | | |...| 0 | 0 | 0
11974 |...| 0 | 1 | | | |...| 0 | 0 | 0
11974 |...| 1 | 0 | | | |...| 0 | 0 | 0
11974 |...| 1 | 1 | | | |...| 0 | 0 | 0
11974 |...| 2 | 0 | | | |...| 0 | 0 | 0
11974 |...| 2 | 1 | | | |...| 0 | 0 | 0
11975 |...| 0 | 0 | db1 | user1 | 2020-07-29 22:49:16 |...| 1 | 11990 | 0
11975 |...| 0 | 1 | db1 | user1 | 2020-07-29 22:49:16 |...| 1 | 11991 | 0
11975 |...| 1 | 0 | | | |...| 0 | 0 | 0
11975 |...| 1 | 1 | | | |...| 0 | 0 | 0
11975 |...| 2 | 0 | | | |...| 0 | 0 | 0
Normally we use pg_stat_activity to show the
current connections. However, how can we find out the Pgpool-II process PID
that corresponds to the backend process? In this case, the result of SHOW
POOL_POOLS is very useful to display the Pgpool-II child process PID (pool_pid)
which is connecting to backend process PID (pool_backendpid).
Conclusion
In this post I gave an overview of connection pooling and introduced how to configure connection pooling in Pgpool-II. Connection pooling can enhance performance and save your system resources. I encourage you to try Pgpool-II's connection pooling.
Comments
Post a Comment