PgBulkInsert
Java library for efficient Bulk Inserts to PostgreSQL using the Binary COPY Protocol.
[MIT License]: https://opensource.org/licenses/MIT [COPY command]: http://www.postgresql.org/docs/current/static/sql-copy.html [PgBulkInsert]: https://github.com/bytefish/PgBulkInsert [Npgsql]: https://github.com/npgsql/npgsql The project is written primarily in Java, distributed under the MIT License license, first published in 2016. Key topics include: bulk-inserts, postgresql.
PgBulkInsert
PgBulkInsert is a Java library for Bulk Inserts to PostgreSQL using the Binary COPY Protocol.
It provides a wrapper around the PostgreSQL COPY command:
The COPY command is a PostgreSQL specific feature, which allows efficient bulk import or export of
data to and from a table. This is a much faster way of getting data in and out of a table than using
INSERT and SELECT.
This project wouldn't be possible without the great Npgsql library, which has a beautiful implementation of the Postgres protocol.
Setup
PgBulkInsert is available in the Central Maven Repository.
You can add the following dependencies to your pom.xml to include PgBulkInsert in your project.
xml<dependency> <groupId>de.bytefish</groupId> <artifactId>pgbulkinsert</artifactId> <version>9.0.1</version> </dependency>
The library is a single file, so you can also copy and paste the file into your project, without taking an additional dependency.
PgBulkInsert 9: Modern, Functional API
PgBulkInsert 9.0.0 comes with a new API, that's functional and type-safe to work with.
Quick Start
The new API separates the What (Structure and Mapping) from the How (Execution and Configuration).
1. Define your Data Model
The library works with Java Records, POJOs, or any other data carrier.
javapublic record UserSession( UUID id, long visits, // Primitive String userAgent, // Nullable String Instant createdAt, // Precise Timestamp double[] latLon, // Array PgPoint location // Geometric Point ) {}
2. Define your Mapping
The PgMapper is the heart of the library. It is stateless and thread-safe.
javaPgMapper<UserSession> mapper = PgMapper.forClass(UserSession.class) .map("id", PostgresTypes.UUID.from(UserSession::id)) // ZERO-ALLOCATION: Direct access to primitives via ToLongFunction .map("visits", PostgresTypes.INT8.primitive(UserSession::visits)) // SAFE STRINGS: Strips invalid \u0000 characters to prevent COPY errors .map("user_agent", PostgresTypes.TEXT.removeNullCharacters().from(UserSession::userAgent)) // TYPE-SAFE TIME: Validated at compile-time to prevent timezone issues .map("created_at", PostgresTypes.TIMESTAMPTZ.instant(UserSession::createdAt)) // GEOMETRY: Native support for Postgres geometric types .map("location", PostgresTypes.POINT.from(UserSession::location));
3. Configure and Execute
The PgBulkWriter handles execution details like buffer sizes and stream management.
javaPgBulkWriter<UserSession> writer = new PgBulkWriter<>(mapper) .withBufferSize(256 * 1024); // 256 KB Buffer try (Connection conn = dataSource.getConnection()) { writer.saveAll(conn, "public.user_sessions", sessionList); }
Streaming and Lazy Evaluation
If you are working with a Java Stream (e.g., from a file, a reactive source, or another database), you
can pass it directly:
javaStream<UserSession> massiveStream = getMassiveStreamFromSource(); try (Connection conn = dataSource.getConnection()) { // Uses the stream's iterator to pull data lazily writer.saveAll(conn, "public.user_sessions", massiveStream); }
This approach ensures that records are transformed and written to the PostgreSQL wire format on-the-fly,
keeping your application's memory usage constant regardless of the total number of rows. You can also
pass a Collection<T> or an Iterable<T> to the method.
Mastering the Fluent API
The API is designed around so called PostgresTypes. This class serves as your single
entry point for all PostgreSQL data types.
The Power of PostgresTypes
Instead of a generic map() method that tries to guess your intent, the API uses a "Type-First"
approach. When you type PostgresTypes.INT4., your IDE will offer specific choices:
.primitive(ToIntFunction<T>): High-performance path. No objects created on the heap..boxed(Function<T, Integer>): Use this for nullable database columns or if your POJO uses Integer..from(Function<T, Integer>): Standard object mapping.
Eliminating Timezone Confusion
One of the most common bugs is the confusion between timestamp and timestamptz.
The API solves this:
PostgresTypes.TIMESTAMPonly allows.localDateTime().PostgresTypes.TIMESTAMPTZallows.instant(),.zonedDateTime(), or.offsetDateTime().
Advanced Type Mapping
N-Dimensional Arrays (Matrices & Tensors)
java// Mapping a 2D Matrix (Collection of Collections) .map("data_matrix", PostgresTypes.array2D(PostgresTypes.INT4).from(MyEntity::getMatrix)) // Mapping a 3D Tensor .map("data_tensor", PostgresTypes.array3D(PostgresTypes.FLOAT8).from(MyEntity::getTensor))
Ranges
java// Mapping an integer range .map("age_limit", PostgresTypes.INT4RANGE.from(MyEntity::getAgeRange)) // Mapping a timestamp range .map("validity", PostgresTypes.TSRANGE.from(MyEntity::getValidityPeriod))
Geometric Types
Native support for all PostgreSQL geometric types using dedicated helper records:
POINT:PostgresTypes.POINTCIRCLE:PostgresTypes.CIRCLEPOLYGON:PostgresTypes.POLYGONPATH/LSEG/BOX/LINE
java.map("area", PostgresTypes.POLYGON.from(MyEntity::getBoundaries))
Supported PostgreSQL Types
- Numeric Types
- smallint
- integer
- bigint
- real
- double precision
- numeric
- Date/Time Types
- timestamp
- timestamptz
- date
- time
- interval
- Character Types
- text
- JSON Types
- jsonb
- Boolean Type
- boolean
- Binary Data Types
- bytea
- Network Address Types
- inet (IPv4, IPv6)
- macaddr
- UUID Type
- uuid
- Array Type
- One-Dimensional Arrays
- Range Type
- int4range
- int8range
- numrange
- tsrange
- tstzrange
- daterange
- hstore
- hstore
- Geometric Types
- point
- line
- lseg
- box
- path
- polygon
- circle
Contributors
Showing top 12 contributors by commit count.
