Categories

XLSX Parquet Exporter

Export worksheet data to Parquet and/or NDJSON for data lake and data warehouse pipelines

Export tabular Excel data to analytics-friendly formats.

  • Parquet for columnar warehouse pipelines
  • NDJSON for streaming ingestion and logs
  • Auto schema inference with type casting

Example Results

1 examples

Export Worksheet to Parquet and NDJSON

Generate Parquet and NDJSON together for warehouse and streaming pipelines

xlsx-parquet-exporter-example1.zip View File
View input parameters
{ "excelFile": "/public/samples/xlsx/workbook-sales.xlsx", "outputMode": "both" }

Click to upload file or drag and drop file here

Maximum file size: 100MB Supported formats: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.ms-excel

Key Facts

Category
Format Conversion
Input Types
file, text, number, select, checkbox
Output Type
file
Sample Coverage
4
API Ready
Yes

Overview

The XLSX Parquet Exporter is a specialized utility designed to convert tabular Excel data into analytics-ready formats like Parquet and NDJSON, facilitating seamless integration into data lake and warehouse pipelines.

When to Use

  • Preparing Excel datasets for ingestion into columnar data warehouses like BigQuery, Snowflake, or Redshift.
  • Converting static spreadsheet logs into NDJSON for streaming ingestion into real-time analytics platforms.
  • Standardizing messy Excel files into structured, schema-inferred formats for automated ETL workflows.

How It Works

  • Upload your Excel file and specify the target sheet and header row location.
  • Select your preferred output format: Parquet for columnar storage, NDJSON for streaming, or both.
  • Enable field name sanitization and null conversion to ensure data compatibility with downstream database schemas.
  • Download the processed file or ZIP archive ready for your data pipeline.

Use Cases

Automating the migration of monthly financial reports from Excel into a centralized data lake.
Converting user-submitted survey data from spreadsheets into NDJSON for ingestion into a NoSQL database.
Standardizing inconsistent spreadsheet headers for reliable loading into a production data warehouse.

Examples

1. Warehouse Data Migration

Data Engineer
Background
A team maintains sales records in Excel that need to be loaded into a cloud data warehouse for BI reporting.
Problem
CSV imports often fail due to schema mismatches and lack of native type support.
How to Use
Upload the sales workbook, select 'Parquet' as the output mode, and enable 'Sanitize Field Names'.
Example Config
outputMode: 'parquet', useSanitizedFieldNames: true
Outcome
A schema-ready Parquet file that maps directly to the warehouse table structure without manual data cleaning.

2. Streaming Log Ingestion

Backend Developer
Background
Operational logs are tracked in a shared spreadsheet and need to be ingested into an ELK stack.
Problem
The logs must be in NDJSON format to be processed by the streaming pipeline.
How to Use
Upload the log sheet and set the output mode to 'NDJSON'.
Example Config
outputMode: 'ndjson', nullForEmpty: true
Outcome
A clean NDJSON file ready for immediate ingestion into the streaming pipeline.

Try with Samples

json, xml, xlsx

Related Hubs

FAQ

Why use Parquet over CSV for data warehouses?

Parquet is a columnar storage format that offers superior compression and query performance compared to row-based formats like CSV.

What happens to empty cells in my Excel file?

If 'Convert Empty to Null' is enabled, the tool automatically maps blank cells to null values, preventing schema errors during ingestion.

Can I export multiple sheets at once?

The tool processes one sheet at a time. You can specify the sheet name to target the exact data you need to export.

What does 'Sanitize Field Names' do?

It automatically cleans column headers by removing special characters and spaces, ensuring they meet strict naming conventions required by most SQL databases.

Is there a limit to the file size?

The tool supports files up to 100MB, which is sufficient for most standard analytical datasets.

API Documentation

Request Endpoint

POST /en/api/tools/xlsx-parquet-exporter

Request Parameters

Parameter Name Type Required Description
excelFile file (Upload required) Yes -
sheetName text No -
headerRow number No -
outputMode select No -
useSanitizedFieldNames checkbox No -
nullForEmpty checkbox No -

File type parameters need to be uploaded first via POST /upload/xlsx-parquet-exporter to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "filePath": "/public/processing/randomid.ext",
  "fileName": "output.ext",
  "contentType": "application/octet-stream",
  "size": 1024,
  "metadata": {
    "key": "value"
  },
  "error": "Error message (optional)",
  "message": "Notification message (optional)"
}
File: File

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-xlsx-parquet-exporter": {
      "name": "xlsx-parquet-exporter",
      "description": "Export worksheet data to Parquet and/or NDJSON for data lake and data warehouse pipelines",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=xlsx-parquet-exporter",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]