Tuesday, October 16, 2012

Connection Pooling in Java

Connection Pooling - what is it and why do we need it?

It's a technique to allow multiple clinets to make use of a cached set of shared and reusable connection objects providing access to a database. Connection Pooling feature is supported only on J2SDK 1.4 and later releases.

Opening/Closing database connections is an expensive process and hence connection pools improve the performance of execution of commands on a database for which we maintain connection objects in the pool. It facilitates reuse of the same connection object to serve a number of client requests. Every time a client request is received, the pool is searched for an available connection object and it's highly likely that it gets a free connection object. Otherwise, either the incoming requests are queued or a new connection object is created and added to the pool (depending upon how many connections are already there in the pool and how many the particular implementation and configuration can support). As soon as a request finishes using a connection object, the object is given back to the pool from where it's assigned to one of the queued requests (based on what scheduling algorithm the particular connection pool implementation follows for serving queued requests). Since most of the requests are served using existing connection objects only so the connection pooling approach brings down the average time required for the users to wait for establishing the connection to the database.

How is it used?

It's normally used in a web-based enterprise application where the application server handles the responsibilities of creating connection objects, adding them to the pool, assigning them to the incoming requests, taking the used connection objects back, returning them back to the pool, etc. When a dynamic web page of the web-based application explicitily creates a connection (using JDBC 2.0 pooling manager interfaces and calling getConnection() method on a PooledConnection object ... I'll discuss both the JDBC 1.0 and JDBC 2.0 approaches in a separate article) to the database and closes it after use then the application server internally gives a connection object from the pool itself on the execution of the statement which tries to create a connection (in this case it's called logical connection) and on execution of the statement which tries to close the connection, the application server simply returns the connection back to pool. Remember you can still use JDBC 1.0 / JDBC 2.0 APIs to obtain physical connections. This of course is used very rarely - probably in the cases where connection to that particular database is needed once in a while and maintaining a pool of connection is not really needed.

How many connections the Pool can handle? Who creates/releases them?

These days it's pretty configurable - the maximum connections, the minimum connections, the maximum number of idle connections, etc. these all parameters can be configured by the server administrator. On start up the server creates a fixed number (the configured minimum) of connection objects and adds them to the pool. Once all of these connection objects are exhausted by serving those many clinet requests then any extra request causes a new connection object to be created, added to the pool, and then to be assigned to server that extra request. This continues till the number of connection objects doesn't reach the configured maximum number of connection objects in the pool. The server keep on checking the number of idle connection objects as well and if it finds that there are more number of idle connection objects than the configured value of that parameter then the server simply closes the extra number of idle connections, which are subsequently garbage collected.

Traditional Connection Pooling vs Managed Connection Pooling

Connection Pooling is an open concept and its certainly not limited to the connection pooling we normally notice in the enterprise application i.e., the one managed by the Application Servers. Any application can use this concept and can manage it the way it wants. Connection Pooling simply means creating, managing, and maintaining connection objects in advance. A traditional application can do it manually, but as we can easily observe that as the scalability and reach of the application grows, it becomes more and more difficult to manage connections without having a defined and robust connection pooling mechanism. Otherwise it'll be extremely difficult to ensure the maintainability and availability of the connections and in turn the application.


JDBC connection pooling


The addition of JDBC connection pooling to your application usually involves little or no code modification but can often provide significant benefits in terms of application performance, concurrency and scalability. Improvements such as these can become especially important when your application is tasked with servicing many concurrent users within the requirements of sub second response time. By adhering to a small number of relatively simple connection pooling best practices your application can quickly and easily take effective advantage of connection pooling.

Software Object Pooling

          There are many scenarios in software architecture where some type of object pooling is employed as a technique to improve application performance. Object pooling is effective for two simple reasons. First, the run time creation of new software objects is often more expensive in terms of performance and memory than the reuse of previously created objects. Second, garbage collection is an expensive process so when we reduce the number of objects to clean up we generally reduce the garbage collection load.
As the saying goes, there is no such thing as a free lunch and this maxim is also true with object pooling. Object pooling does require additional overhead for such tasks as managing the state of the object pool, issuing objects to the application and recycling used objects. Therefore objects that don’t have short lifetimes in your application may not be good choices for object pooling since their low rate of reuse may not warrant the overhead of pooling.
However, objects that do have short lifetimes are often excellent candidates for pooling. In a pooling scenario your application first creates an object pool that can both cache pooled objects and issue objects that are not in use back to the application. For example, pooled objects could be database connections, process threads, server sockets or any other kind of object that may be expensive to create from scratch. As your application first starts asking the pool for objects they will be newly created but when the application has finished with the object it is returned to the pool rather than destroyed. At this point the benefits of object pooling will be realized since, now as the application needs more objects, the pool will be able to issue recycled objects that have previously been returned by the application.

JDBC Connection Pooling

              JDBC connection pooling is conceptually similar to any other form of object pooling. Database connections are often expensive to create because of the overhead of establishing a network connection and initializing a database connection session in the back end database. In turn, connection session initialization often requires time consuming processing to perform user authentication, establish transactional contexts and establish other aspects of the session that are required for subsequent database usage.
Additionally, the database's ongoing management of all of its connection sessions can impose a major limiting factor on the scalability of your application. Valuable database resources such as locks, memory, cursors, transaction logs, statement handles and temporary tables all tend to increase based on the number of concurrent connection sessions.
All in all, JDBC database connections are both expensive to initially create and then maintain over time. Therefore, as we shall see, they are an ideal resource to pool.
If your application runs within a J2EE environment and acquires JDBC connections from an appserver defined datasource then your application is probably already using connection pooling. This fact also illustrates an important characteristic of a best practices pooling implementation -- your application is not even aware it's using it! Your J2EE application simply acquires JDBC connections from the datasource, does some work on the connection then closes the connection. Your application's use of connection pooling is transparent. The characteristics of the connection pool can be tweaked and tuned by your appserver's administrator without the application ever needing to know.
If your application is not J2EE based then you may need to investigate using a standalone connection pool manager. Connection pool implementations are available from JDBC driver vendors and a number of other sources.


JDBC Connection Scope

How should your application manage the life cycle of JDBC connections? Asked another way, this question really asks - what is the scope of the JDBC connection object within your application? Let's consider a servlet that performs JDBC access. One possibility is to define the connection with servlet scope as follows.


import java.sql.*;

public class JDBCServlet extends HttpServlet {

    private Connection connection;

    public void init(ServletConfig c) throws ServletException {
      //Open the connection here
    }

    public void destroy() {
     //Close the connection here
    }

    public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException {
      //Use the connection here
      Statement stmt = connection.createStatement();
      ..<do JDBC work>..
  }
}

Using this approach the servlet creates a JDBC connection when it is loaded and destroys it when it is unloaded. The doGet() method has immediate access to the connection since it has servlet scope. However the database connection is kept open for the entire lifetime of the servlet and that the database will have to retain an open connection for every user that is connected to your application. If your application supports a large number of concurrent users its scalability will be severely limited!


Method Scope Connections

To avoid the long life time of the JDBC connection in the above example we can change the connection to have method scope as follows.


public class JDBCServlet extends HttpServlet {

  private Connection getConnection() throws SQLException {
    ..<create a JDBC connection>..
  }

  public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException {
    try {
      Connection connection = getConnection();
      ..<do JDBC work>..
      connection.close();
    }
    catch (SQLException sqlException) {
      sqlException.printStackTrace();
    }
  }
}
 

This approach represents a significant improvement over our first example because now the connection's life time is reduced to the time it takes to execute doGet(). The number of connections to the back end database at any instant is reduced to the number of users who are concurrently executing doGet(). However this example will create and destroy a lot more connections than the first example and this could easily become a performance problem.
In order to retain the advantages of a method scoped connection but reduce the performance hit of creating and destroying a large number of connections we now utilize connection pooling to arrive at our finished example that illustrates the best practices of connecting pool usage.


import java.sql.*;
import javax.sql.*;

public class JDBCServlet extends HttpServlet {

  private DataSource datasource;

  public void init(ServletConfig config) throws ServletException {
    try {
      // Look up the JNDI data source only once at init time
      Context envCtx = (Context) new InitialContext().lookup("java:comp/env");
      datasource = (DataSource) envCtx.lookup("jdbc/MyDataSource");
    }
    catch (NamingException e) {
      e.printStackTrace();
    }
  }

  private Connection getConnection() throws SQLException {
    return datasource.getConnection();
  }

  public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException {
    Connection connection=null;
    try {
      connection = getConnection();
      ..<do JDBC work>..
    }
    catch (SQLException sqlException) {
      sqlException.printStackTrace();
    }
    finally {
      if (connection != null)
        try {connection.close();} catch (SQLException e) {}
      }
    }
  }
}
 


This approach uses the connection only for the minimum time the servlet requires it and also avoids creating and destroying a large number of physical database connections. The connection best practices that we have used are:
  • A JNDI datasource is used as a factory for connections. The JNDI datasource is instantiated only once in init() since JNDI lookup can also be slow. JNDI should be configured so that the bound datasource implements connecting pooling. Connections issued from the pooling datasource will be returned to the pool when closed.

  • We have moved the connection.close() into a finally block to ensure that the connection is closed even if an exception occurs during the doGet() JDBC processing. This practice is essential when using a connection pool. If a connection is not closed it will never be returned to the connection pool and become available for reuse. A finally block can also guarantee the closure of resources attached to JDBC statements and result sets when unexpected exceptions occur. Just call close() on these objects also.


Connection Pool Tuning

One of the major advantages of using a connection pool is that characteristics of the pool can be changed without affecting the application. If your application confines itself to using generic JDBC you could even point it at a different vendor's database without changing any code! Different pool implementations will provide different settable properties to tune the connection pool. Typical properties include the number of initial connections, the minimum and maximum number of connections that can be present at any time and a mechanism to purge connections that have been idle for a specific period of time.

In general, optimal performance is attained when the pool in its steady state contains just enough connections to service all concurrent connection requests without having to create new physical database connections. If the pooling implementation supports purging idle connections it can optimize its size over time to accommodate varying application loads over the course of a day. For example, scaling up the number of connections cached in the pool during business hours then dynamically reducing the pool size after business hours.

Connection Pooling Metrics

In order to compare the difference between using non pooled connections and connection pooling I built a simple servlet that displays orders in Oracle's sample OE (Order Entry) database schema. The testing configuration consists of Jakarta's JMetric load testing tool, the Tomcat 5.5 servlet container and an Oracle 10g database instance. Tomcat and Oracle were running on separate 512MB machines connected by 100Mbps Ethernet.
The servlet is written to use either pooled or non pooled database connections depending on the query string passed in its URL. So the servlet can be dynamically instructed by the load tester to use (or not use) connection pooling in order to compare throughput in both modes. The servlet creates pooled connections using a Tomcat DBCP connection pool and non pooled connections directly from Oracle's thin JDBC driver. Having acquired a connection, the servlet executes a simple join between the order header and line tables then formats and outputs the results as HTML.


No comments:

Post a Comment