Key Facts
- Category
- Conversion & Encoding
- Input Types
- text, textarea, number, select, checkbox
- Output Type
- file
- Sample Coverage
- 4
- API Ready
- Yes
Overview
The XLSX S3 Batch Processor allows you to efficiently clean, filter, and transform multiple Excel files stored in S3-compatible object storage, streamlining your data preparation workflows without manual downloads.
When to Use
- •When you need to perform bulk data cleaning or filtering across multiple XLSX files stored in an S3 bucket.
- •When you need to convert large sets of Excel files into standardized CSV or JSON formats for downstream processing.
- •When you want to automate the extraction of specific data subsets from cloud-hosted spreadsheets and save the results back to your storage.
How It Works
- •Connect to your S3-compatible storage using your credentials and specify the object keys for the files you wish to process.
- •Define your cleaning preferences, such as trimming whitespace and removing empty rows, to ensure data consistency.
- •Apply optional filters by column and operator to isolate specific records, then select your preferred output format (XLSX, CSV, or JSON).
- •Optionally enable the upload-back feature to automatically save your processed files directly to your S3 bucket with a custom prefix.
Use Cases
Examples
1. Batch Filtering Paid Invoices
Data Analyst- Background
- A company stores thousands of individual invoice files in an S3 bucket, and the analyst needs to extract only the 'paid' invoices for a quarterly audit.
- Problem
- Manually downloading and filtering each file is inefficient and prone to error.
- How to Use
- Input the list of invoice object keys, set the filter column to 'status', the operator to 'equals', and the value to 'paid'.
- Example Config
-
filterColumn: status, filterOperator: equals, filterValue: paid, outputFormat: csv - Outcome
- The tool generates a consolidated CSV file containing only the paid invoice records, ready for audit.
2. Cleaning and Standardizing Data Exports
System Administrator- Background
- Raw data exports from a legacy system often contain inconsistent whitespace and empty rows that break downstream database imports.
- Problem
- The data requires consistent cleaning before it can be ingested into the new system.
- How to Use
- Configure the tool to process the raw files with 'Trim Whitespace' and 'Remove Empty Rows' enabled, then save the cleaned files back to a 'processed/' folder.
- Example Config
-
trimWhitespace: true, removeEmptyRows: true, uploadBack: true, outputPrefix: processed/ - Outcome
- Cleaned, standardized files are automatically saved back to the S3 bucket, ready for immediate database import.
Try with Samples
csv, xlsx, xlsRelated Hubs
FAQ
Can I process files from non-AWS S3-compatible storage?
Yes, you can provide a custom S3 endpoint URL to connect to other S3-compatible storage services.
What output formats are supported?
The tool supports exporting processed data as XLSX, CSV, or JSON files.
Does this tool modify my original files?
No, the tool reads your original files and creates new processed versions. If you enable the upload-back feature, the new files are saved as separate objects.
How do I filter data within the files?
You can specify a column name, select a comparison operator (like equals, greater than, or contains), and provide a filter value to extract only the rows that meet your criteria.
Is it possible to process multiple files at once?
Yes, you can provide a list of object keys separated by newlines or commas to process multiple files in a single batch operation.