'No FileSystem for scheme: gs' Error

See original GitHub issue

I’m running PySpark application on a standalone Spark cluster. I can see that the gcs connector (gcs-connector-hadoop2-2.0.1.jar) is added successfully when I’m submitting an application:

21/01/25 11:46:25 INFO SparkContext: Added JAR /opt/spark/jars/gcs-connector-hadoop2-2.0.1.jar at spark://pyspark-1.c.master.internal:43981/jars/gcs-connector-hadoop2-2.0.1.jar with timestamp 1611575185463

but gs:// prefix isn’t recognized.

py4j.protocol.Py4JJavaError: An error occurred while calling o41.json.
: java.io.IOException: No FileSystem for scheme: gs
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
	at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:477)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:834)

This is my configuration:

{
  "spark.jars": "/opt/spark/jars/gcs-connector-hadoop2-2.0.1.jar",
  "google.cloud.auth.service.account.enable": "true",
  "google.cloud.auth.service.account.json.keyfile": "/path/to/service/account/file.json"
}

Then, I tried adding the following configuration: "fs.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem" but still I’m getting an error:

py4j.protocol.Py4JJavaError: An error occurred while calling o45.json.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
	at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:477)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
	... 25 more

Hope you could help, thank you!

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

michalbarer1commented, Jan 31, 2021

@medb Thank you, the solution that helped me was to add --driver-class-path to spark-submit command

0reactions

medbcommented, Jan 26, 2021

Also, please take a look at https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/180, maybe this information will be useful to solve your issue.

Top Results From Across the Web

"No Filesystem for Scheme: gs" when running spark job locally

xml to the classpath, now I get the following error upon doing sc = new JavaSparkContext(conf); - java.lang.ClassNotFoundException: org.apache.hadoop.fs.

When writing to BQ run into this error No FileSystem for ...

I want to write to BQ from databricks using google bq connector but run into an error. Here are the codes and error...

IOException: No FileSystem for scheme: gs - Hail Discussion

while trying to run load_dataset on AWS EMR I see below error. I am using pyspark and initiating hail. ... SparkSession available as...

Installing HDFS Google Cloud Connector - Cloudera Community

... I am receiving the following error: No FileSystem for scheme: gs. ... The other step is to add the gs settings to...

java.io.IOException: No FileSystem for scheme: null

Getting the error when try to load the uploaded file in py notebook. # File location and type file_location = "//FileStore/tables/data/d1.csv".