Spark-NLP on Windows: Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize. : java.lang.NoClassDefFoundError: org/json4s/package$MappingException

See original GitHub issue

I’m having difficulties running through the installation process. I’m following these guides, although I have found that I can get the furthest along with the latest versions of the packages:

OS:

Windows 10
Anaconda3 (Navigator 2.1.2)
Python 3.7.12

Spark libraries:

Spark NLP version 3.4.1
Apache Spark version: 3.2.1

JDK:

openjdk version "1.8.0_322"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)

The following code runs:

import os
os.environ["PATH"] = os.environ["JAVA_HOME"] + "bin;" + os.environ["PATH"]

import sparknlp
from pyspark.ml import Pipeline
from sparknlp.base import *
import pyspark.sql.functions as F
from sparknlp.annotator import *

spark = sparknlp.start(spark32=True)

The following code throws an error:

from sparknlp.pretrained import PretrainedPipeline
pipeline_dl = PretrainedPipeline("explain_document_dl", lang="en")

The error is:

explain_document_dl download started this may take some time.
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15016\2007619137.py in <module>
      1 from sparknlp.pretrained import PretrainedPipeline
----> 2 pipeline_dl = PretrainedPipeline("explain_document_dl", lang="en")

~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc, parse_embeddings, disk_location)
    139     def __init__(self, name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None):
    140         if not disk_location:
--> 141             self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
    142         else:
    143             self.model = PipelineModel.load(disk_location)

~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
     70     def downloadPipeline(name, language, remote_loc=None):
     71         print(name + " download started this may take some time.")
---> 72         file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
     73         if file_size == "-1":
     74             print("Can not find the model to download please check the name!")

~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\internal.py in __init__(self, name, language, remote_loc)
    230     def __init__(self, name, language, remote_loc):
    231         super(_GetResourceSize, self).__init__(
--> 232             "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
    233 
    234 

~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\internal.py in __init__(self, java_obj, *args)
    163         super(ExtendedJavaWrapper, self).__init__(java_obj)
    164         self.sc = SparkContext._active_spark_context
--> 165         self._java_obj = self.new_java_obj(java_obj, *args)
    166         self.java_obj = self._java_obj
    167 

~\Anaconda3\envs\Spark-NLP\lib\site-packages\sparknlp\internal.py in new_java_obj(self, java_class, *args)
    173 
    174     def new_java_obj(self, java_class, *args):
--> 175         return self._new_java_obj(java_class, *args)
    176 
    177     def new_java_array(self, pylist, java_class):

~\Anaconda3\envs\Spark-NLP\lib\site-packages\pyspark\ml\wrapper.py in _new_java_obj(java_class, *args)
     64             java_obj = getattr(java_obj, name)
     65         java_args = [_py2java(sc, arg) for arg in args]
---> 66         return java_obj(*java_args)
     67 
     68     @staticmethod

~\Anaconda3\envs\Spark-NLP\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
   1320         answer = self.gateway_client.send_command(command)
   1321         return_value = get_return_value(
-> 1322             answer, self.gateway_client, self.target_id, self.name)
   1323 
   1324         for temp_arg in temp_args:

~\Anaconda3\envs\Spark-NLP\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
    109     def deco(*a, **kw):
    110         try:
--> 111             return f(*a, **kw)
    112         except py4j.protocol.Py4JJavaError as e:
    113             converted = convert_exception(e.java_exception)

~\Anaconda3\envs\Spark-NLP\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: java.lang.NoClassDefFoundError: org/json4s/package$MappingException
	at org.json4s.ext.EnumNameSerializer.deserialize(EnumSerializer.scala:53)
	at org.json4s.Formats$$anonfun$customDeserializer$1.applyOrElse(Formats.scala:66)
	at org.json4s.Formats$$anonfun$customDeserializer$1.applyOrElse(Formats.scala:66)
	at scala.collection.TraversableOnce.collectFirst(TraversableOnce.scala:180)
	at scala.collection.TraversableOnce.collectFirst$(TraversableOnce.scala:167)
	at scala.collection.AbstractTraversable.collectFirst(Traversable.scala:108)
	at org.json4s.Formats$.customDeserializer(Formats.scala:66)
	at org.json4s.Extraction$.customOrElse(Extraction.scala:775)
	at org.json4s.Extraction$.extract(Extraction.scala:454)
	at org.json4s.Extraction$.extract(Extraction.scala:56)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:22)
	at com.johnsnowlabs.util.JsonParser$.parseObject(JsonParser.scala:28)
	at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.parseJson(ResourceMetadata.scala:109)
	at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$$anonfun$readResources$1.applyOrElse(ResourceMetadata.scala:138)
	at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$$anonfun$readResources$1.applyOrElse(ResourceMetadata.scala:137)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at scala.collection.Iterator$$anon$13.next(Iterator.scala:593)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
	at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toList(TraversableOnce.scala:350)
	at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:350)
	at scala.collection.AbstractIterator.toList(Iterator.scala:1431)
	at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.readResources(ResourceMetadata.scala:137)
	at com.johnsnowlabs.nlp.pretrained.ResourceMetadata$.readResources(ResourceMetadata.scala:132)
	at com.johnsnowlabs.client.aws.AWSGateway.getMetadata(AWSGateway.scala:78)
	at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
	at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:68)
	at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:145)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:445)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:584)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: org.json4s.package$MappingException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 51 more

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
maziyarpanahicommented, Mar 10, 2022

You are still on pyspark 3.2:

Apache Spark version: 3.2.1

For that you need to use either sparknlp.start(spark32=True) or spark-nlp-spark32 maven package name. (You didn’t share how you construct/start SparkSession). No need to downgrade anything, just a right package name will fix this. (Keep every pyspark on 3.2.1 and all the spark-nlp-spark32 maven package on 3.4.1, the PyPI spark-nlp is the same foe everything)

0reactions
maziyarpanahicommented, Aug 20, 2022

@vpisanoVH this issue is old, it belongs to the time we had different maven packages for different PySpark versions. From the 4.0.0 release, we no longer need that, so if you can please open a new issue (bug report) and fill in the template we can follow up there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Py4JJavaError: An error occurred while calling z ...
Py4JJavaError : An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize. : java.lang.
Read more >
Py4JJavaError: An error ocurred while caling z ...
Py4JJavaError : An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize. : java.lang.
Read more >
an error occurred while calling z:com.johnsnowlabs.nlp. ...
Hi, I am new to spark-nlp. As my first project and I am working with about 3m tweets. I have followed the instructions...
Read more >
Pipelines - Spark NLP - John Snow Labs
Pretrained Pipelines have moved to Models Hub. Please follow this link for the updated ... PretrainedPipeline import com.johnsnowlabs.nlp.
Read more >
Introduction to Spark NLP: Foundations and Basic ...
This is the first article in a series of blog posts to help Data Scientists and NLP practitioners learn the basics of Spark...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found