Spark-NLP on Windows: Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize. : java.lang.NoClassDefFoundError: org/json4s/package$MappingException
See original GitHub issueI’m having difficulties running through the installation process. I’m following these guides, although I have found that I can get the furthest along with the latest versions of the packages:
- https://github.com/JohnSnowLabs/spark-nlp/discussions/1022
- https://nlp.johnsnowlabs.com/docs/en/install#windows-support
OS:
Windows 10
Anaconda3 (Navigator 2.1.2)
Python 3.7.12
Spark libraries:
Spark NLP version 3.4.1
Apache Spark version: 3.2.1
JDK:
openjdk version "1.8.0_322"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
The following code runs:
import os
os.environ["PATH"] = os.environ["JAVA_HOME"] + "bin;" + os.environ["PATH"]
import sparknlp
from pyspark.ml import Pipeline
from sparknlp.base import *
import pyspark.sql.functions as F
from sparknlp.annotator import *
spark = sparknlp.start(spark32=True)
The following code throws an error:
from sparknlp.pretrained import PretrainedPipeline
pipeline_dl = PretrainedPipeline("explain_document_dl", lang="en")
The error is:
explain_document_dl download started this may take some time.
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15016\2007619137.py in <module>
1 from sparknlp.pretrained import PretrainedPipeline
----> 2 pipeline_dl = PretrainedPipeline("explain_document_dl", lang="en")
~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc, parse_embeddings, disk_location)
139 def __init__(self, name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None):
140 if not disk_location:
--> 141 self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
142 else:
143 self.model = PipelineModel.load(disk_location)
~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
70 def downloadPipeline(name, language, remote_loc=None):
71 print(name + " download started this may take some time.")
---> 72 file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
73 if file_size == "-1":
74 print("Can not find the model to download please check the name!")
~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\internal.py in __init__(self, name, language, remote_loc)
230 def __init__(self, name, language, remote_loc):
231 super(_GetResourceSize, self).__init__(
--> 232 "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
233
234
~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\internal.py in __init__(self, java_obj, *args)
163 super(ExtendedJavaWrapper, self).__init__(java_obj)
164 self.sc = SparkContext._active_spark_context
--> 165 self._java_obj = self.new_java_obj(java_obj, *args)
166 self.java_obj = self._java_obj
167
~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\internal.py in new_java_obj(self, java_class, *args)
173
174 def new_java_obj(self, java_class, *args):
--> 175 return self._new_java_obj(java_class, *args)
176
177 def new_java_array(self, pylist, java_class):
~\Anaconda3\envs\Spark-NLP\lib\site-packages\pyspark\ml\wrapper.py in _new_java_obj(java_class, *args)
64 java_obj = getattr(java_obj, name)
65 java_args = [_py2java(sc, arg) for arg in args]
---> 66 return java_obj(*java_args)
67
68 @staticmethod
~\Anaconda3\envs\Spark-NLP\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1320 answer = self.gateway_client.send_command(command)
1321 return_value = get_return_value(
-> 1322 answer, self.gateway_client, self.target_id, self.name)
1323
1324 for temp_arg in temp_args:
~\Anaconda3\envs\Spark-NLP\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
~\Anaconda3\envs\Spark-NLP\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: java.lang.NoClassDefFoundError: org/json4s/package$MappingException
at org.json4s.ext.EnumNameSerializer.deserialize(EnumSerializer.scala:53)
at org.json4s.Formats$$anonfun$customDeserializer$1.applyOrElse(Formats.scala:66)
at org.json4s.Formats$$anonfun$customDeserializer$1.applyOrElse(Formats.scala:66)
at scala.collection.TraversableOnce.collectFirst(TraversableOnce.scala:180)
at scala.collection.TraversableOnce.collectFirst$(TraversableOnce.scala:167)
at scala.collection.AbstractTraversable.collectFirst(Traversable.scala:108)
at org.json4s.Formats$.customDeserializer(Formats.scala:66)
at org.json4s.Extraction$.customOrElse(Extraction.scala:775)
at org.json4s.Extraction$.extract(Extraction.scala:454)
at org.json4s.Extraction$.extract(Extraction.scala:56)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:22)
at com.johnsnowlabs.util.JsonParser$.parseObject(JsonParser.scala:28)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.parseJson(ResourceMetadata.scala:109)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$$anonfun$readResources$1.applyOrElse(ResourceMetadata.scala:138)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$$anonfun$readResources$1.applyOrElse(ResourceMetadata.scala:137)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at scala.collection.Iterator$$anon$13.next(Iterator.scala:593)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toList(TraversableOnce.scala:350)
at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:350)
at scala.collection.AbstractIterator.toList(Iterator.scala:1431)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.readResources(ResourceMetadata.scala:137)
at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.readResources(ResourceMetadata.scala:132)
at com.johnsnowlabs.client.aws.AWSGateway.getMetadata(AWSGateway.scala:78)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:68)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:145)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:445)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:584)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: org.json4s.package$MappingException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 51 more
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Py4JJavaError: An error occurred while calling z ...
Py4JJavaError : An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize. : java.lang.
Read more >Py4JJavaError: An error ocurred while caling z ...
Py4JJavaError : An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize. : java.lang.
Read more >an error occurred while calling z:com.johnsnowlabs.nlp. ...
Hi, I am new to spark-nlp. As my first project and I am working with about 3m tweets. I have followed the instructions...
Read more >Pipelines - Spark NLP - John Snow Labs
Pretrained Pipelines have moved to Models Hub. Please follow this link for the updated ... PretrainedPipeline import com.johnsnowlabs.nlp.
Read more >Introduction to Spark NLP: Foundations and Basic ...
This is the first article in a series of blog posts to help Data Scientists and NLP practitioners learn the basics of Spark...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You are still on pyspark 3.2:
Apache Spark version: 3.2.1
For that you need to use either sparknlp.start(spark32=True) or spark-nlp-spark32 maven package name. (You didn’t share how you construct/start SparkSession). No need to downgrade anything, just a right package name will fix this. (Keep every pyspark on 3.2.1 and all the spark-nlp-spark32 maven package on 3.4.1, the PyPI spark-nlp is the same foe everything)
@vpisanoVH this issue is old, it belongs to the time we had different maven packages for different PySpark versions. From the 4.0.0 release, we no longer need that, so if you can please open a new issue (bug report) and fill in the template we can follow up there.