Overview

Parquet Destination Component is SSIS Data Flow Component for generating Apache Parquet files.
- The component metadata is either automatically retrieved from a sample Parquet file or can be manually specified in JSON format.
- The generated Parquet file can contain nested arrays of objects following the composite records pattern), where the fields for the arrays are fed via separate inputs.
- The generated Parquet content can be written to a file or stored in a variable.
Quick Start
In this section we will show you how to set up a Parquet Destination component.
- Double-click on the component on the canvas.
- Once the component editor opens, select the destination where generated JSON data will be stored: then provide Parquet sample file or directly write the schema into Schema text editor. You can also change the size of the groups of rows into which the parquet file will be divided internally.
.
- When clicking on Mapping tab the component would prepare the inputs and external columns by analyzing the scehma in the Schema text editor. Please note that the Parquet Destination can have multiple inputs (see the article about composite records), which columns you can see. The data in these inputs can be processed by upstream transformation and source components (e.g. a Query Transformation can be used to retrieve the necessary data from SQL Server database).
- Click OK to close the component editor.
Congratulations! You have successfully configured the Parquet Destination component.
Parameters
Configuration
Use the parameters below to configure the component.
Indicates the destination of Parquet data. The following options are available:
Value Description File The Parquet data will be stored in a file. Select an existing File Connection Manager or create a new one. Variable The Parquet data will be stored in a variable. Select a variable or create a new one. A variable to which the Parquet data will be written.
Represents the maximum number of rows in a parquet row group. Row group is a logical horizontal partitioning of the data into rows. It holds serialized (and compressed) arrays of column entries.
Specify the compression method applied to columns in the destination file. The following options are available:
Value Description None None compression means no compression is applied to the data. Snappy Snappy compression is a fast algorithm developed by Google, optimized for speed over compression ratio. Used in real-time processing systems where low latency is critical. Gzip Gzip compression uses the DEFLATE algorithm and is one of the most widely supported formats. General-purpose compression, especially for files and web traffic. LZ4 LZ4 compression is designed for extremely fast compression and decompression with minimal CPU usage. Used in high-throughput systems such as streaming or databases. Zstd Zstandard compression is a modern algorithm developed by Facebook, offering a strong balance between speed and compression ratio. Used in systems needing both efficient storage and good performance. Lz4Raw LZ4 Raw compression is the raw/block version of LZ4 compression without headers or framing metadata. Specify the compression level applied to columns in the destination file. The following options are available:
Value Description Optimal Optimal compression prioritizes reducing file size as much as possible, even if the process takes longer. Fastest Fastest compression prioritizes speed over file size, producing compressed data quickly but with less efficiency. NoCompression No compression means the data is stored as-is, without any size reduction. Specifies the encoding that should be used in the destination file.
Dictionary Encoding stores repeated values as references to a dictionary of unique values. Very effective when a column has many repeated or low-cardinality values (e.g., country codes, boolean flags).
Delta Binary Packed Encoding stores differences between consecutive values rather than the full values. Works best on numeric sequences or sorted data where values change gradually (e.g., timestamps, sequential IDs).
When both are enabled, Parquet can first apply dictionary encoding to reduce repeated values, followed by delta-binary encoding to compress numeric sequences within the dictionary. This can maximize compression for the appropriate data. Either dictionary or delta-binary encoding can be chosen individually depending on the data characteristics. When no special encoding is applied, data is stored in raw form, which may be faster to write but results in larger file sizes.
JSON string representing the schema of the Parquet file.
In 2.3 release new schema format is introduced. For more information visit Components Metadata Schema.
What's New
- New: Support for stream output.
- Fixed: Errors when creating Parquet file in a dynamic data flow (Thank you, Sam!).
- New: Automatic schema generation when attaching to upstream component.
- New: Support for dynamic data flow.
- Fixed: Incorrect lower-case headers (Thank you, Romain).
- Fixed: Missing record in each batch of records (Thank you, Romain).
- New Introduced component.
Related documentation
COZYROC SSIS+ Components Suite is free for testing in your development environment.
A licensed version can be deployed on-premises, on Azure-SSIS IR and on COZYROC Cloud.




