Example usage of ParquetWriter.openStream?

See original GitHub issue

I saw that @asmuth implemented it here: https://github.com/ironSource/parquetjs/blob/master/lib/writer.js#L52 But I’m wondering if someone has a simple example of how to use this.

I’ve got it mostly working but I had to implement a close method on a PassThrough stream, I feel might be doing it incorrectly.

I’d be happy to submit a PR with some nice doc as well once I have it all figured out since I imagine this is a common enough use case.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:11
  • Comments:6

github_iconTop GitHub Comments

7reactions
ali-habibzadehcommented, Jan 29, 2020

The easier way I have found is not using ParquetWriter.openStream but via an undocumented ParquetTransformer class. This class extends node’s Transform stream and can be used inside a pipe as a step.

import { createReadStream, createWriteStream } from "fs";
import { ParquetSchema, ParquetTransformer } from "parquetjs";
// stream-json has some nice streaming tools for working with JSON
import * as StreamArray from "stream-json/streamers/StreamArray";

const reader = createReadStream("data-json.json"); // contains JSON Array
const destination = createWriteStream("countries.parquet");

const schema = new ParquetSchema({
  value: {
    fields: {
      Country: { type: "UTF8" },
      Indicator: { type: "UTF8" },
      Value: { type: "FLOAT" },
      Year: { type: "INT64" }
    }
  }
});

reader
  .pipe(StreamArray.withParser())
  .pipe(new ParquetTransformer(schema))
  .pipe(destination);
5reactions
dogenius01commented, Dec 18, 2018

Similar issue. How to write parquet file using stream?

Read more comments on GitHub >

github_iconTop Results From Across the Web

builtins.Promise.openStream JavaScript and Node.js code ...
Convenience method to create a new buffered parquet writer that writes to * the specified file */ static async openFile(schema, path, ...
Read more >
parquetjs-stream - npm
fully asynchronous, pure JavaScript implementation of the Parquet file format. Latest version: 0.8.1, last published: 4 years ago.
Read more >
Parquet Writer to buffer or byte stream - java - Stack Overflow
Correct answer is by: @breadcrumb42 with counting bytes for position. Other solution could be use: CountingOutpustream and set position based on ...
Read more >
org.apache.parquet.hadoop.ParquetWriter Java Examples
The following examples show how to use org.apache.parquet.hadoop.ParquetWriter. You can vote up the ones you like or vote down the ones you don't...
Read more >
pyarrow.parquet.ParquetWriter — Apache Arrow v10.0.1
Determine which Parquet logical types are available for use, whether the reduced set from the Parquet 1. ... ParquetWriter('example.parquet', table.schema).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found