Reading timestamp with type INT64 in parquet file
See original GitHub issueI am using AWS EMR 5.28.0 which has Presto 0.227 (Somehow, it works in AWS Athena.)
I have an AWS DMS job that migrates the AWS RDS data to S3. While migrating it also converts to Parquet. When setting target endpoint in S3, I have tried all combinations of parameters
parquetTimestampInMillisecond and parquetVersion, but none works.
Ideally, when I set AWS parquetTimestampInMillisecond to true, I should be able to read any timestamp column from Presto.
But, I get the following error:
The column columnName is declared as type timestamp, but the Parquet file declares the column as type INT64
Using parquet-tools, I see following output for the timestamp column in my parquet file in S3:
columnName: INT64 UNCOMPRESSED DO:0 FPO:99321 SZ:24381/24381/1.00 VC:3041 ENC:PLAIN,RLE ST:[min: 2019-11-26T04:21:44.403, max: 2019-11-26T04:22:45.932, num_nulls: 0]
When I use following definition to create table, it works
CREATE TABLE hive.bigdata.tableName (
columnName bigint
)
WITH (
external_location = 's3://location',
format = 'PARQUET'
)
But I have to create a view to read column as timestamp,
CREATE VIEW hive.bigdata.tableName2 AS SELECT
cast(date_format(from_unixtime(columnName/1000), '%Y-%m-%d %h:%i:%s') as timestamp) as columnName
FROM hive.bigdata.tableName
Is there any direct way to create a table from the INT64 column? or any way to tell AWS DMS to transfer in supported data type.
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (6 by maintainers)
Top Related StackOverflow Question
@findepi there may be something more going on here but the error message mentioned is from a Facebook Presto specific check (linked above) that is relatively new so doesn’t exist in PrestoSQL. There are some other pending pull requests like allowing statistics pruning for timestamps that are int64 so I guess I’d be surprised if it’s broken. It is possible it’s something like AWS DMS creates the type as int64 but doesn’t tag it as a logical type of TIMESTAMP_MILLIS or something along those lines but the Parquet tools metadata there shows it rendering correctly the min/max stats for int64 in timestamp format (I think that means logical type is applied correctly).
@prakharjain-vogo would it be possible to test locally with the latest PrestoSQL (I know this is not on EMR unfortunately)
Going to close as this was an issue in PrestoDB that got fixed in the previously mentioned linked issue. @prakharjain-vogo feel free to reopen if this is still an issue with PrestoSQL.