TypeError: 'JavaPackage' object is not callable when running pydeequ

See original GitHub issue

Describe the bug I’ve got an exception when I try to run pydeequ: “TypeError: ‘JavaPackage’ object is not callable”.

To Reproduce Steps to reproduce the behavior:

  1. pip install pydeequ==0.1.5
  2. Code:
from pyspark.sql import SparkSession, Row
import pydeequ

spark = (SparkSession
    .builder
    .config("spark.jars.packages", pydeequ.deequ_maven_coord)
    .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
    .getOrCreate())

df = spark.sparkContext.parallelize([
            Row(a="foo", b=1, c=5),
            Row(a="bar", b=2, c=6),
            Row(a="baz", b=3, c=None)]).toDF()

from pydeequ.analyzers import *

analysisResult = AnalysisRunner(spark) \
                    .onData(df) \
                    .addAnalyzer(Size()) \
                    .addAnalyzer(Completeness("b")) \
                    .run()
                    
analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
analysisResult_df.show()
  1. Execute the code above
  2. See error: TypeError: ‘JavaPackage’ object is not callable

Expected behavior I was expecting the results of the analyzer.

Screenshots If applicable, add screenshots to help explain your problem. image

Desktop (please complete the following information):

  • Apache Spark 3.0.0
  • Scala 2.12
  • Pydeequ = 0.1.5

Additional context I’m running it on a Databricks cluster.

Thank you for your help.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:9
  • Comments:28 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
SerenaLin2020commented, Dec 7, 2020

experiencing the same issue. solved by using pyspark --jars /path-to-the-jar/deequ-1.0.5.jar more info: python version: 3.7.9 spark version: 2.4.7 scala version: 2.13.4

3reactions
vinuracommented, Jul 28, 2021

I installed the following maven package directly instead of pydeequ.deequ_maven_coord

com.amazon.deequ:deequ:1.1.0_spark-3.0-scala-2.12

you need to check wthr they have an exact match for your cluster and add it as a maven package on the databricks cluster. @anusha610 if you are running it on locally(using dbconnect) use the spark object as follows,

spark = (SparkSession .builder .config(“spark.jars.packages”, ‘com.amazon.deequ:deequ:1.1.0_spark-3.0-scala-2.12’) .config(“spark.jars.excludes”, pydeequ.f2j_maven_coord) .getOrCreate())

Read more comments on GitHub >

github_iconTop Results From Across the Web

JavaPackage object is not callable - pydeequ
Hi! When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:....
Read more >
"'JavaPackage' object is not callable" error executing explain ...
For future reference - when you see 'JavaPackage' object is not callable , it often means that the target Java class was not...
Read more >
'JavaPackage' object is not callable when using Pyspark - Hail ...
Whenever I try to use PySpark within Python code: from pyspark import SparkConf, SparkContext conf = (SparkConf().set("spark.executor.memory", ...
Read more >
Responses – Medium
when running this code ... it returns an error saying TypeError: 'JavaPackage' object is not callable ... I observed model.save() not working this...
Read more >
ArcGIS Pro errors "TypeError: "JavaPackage" object is not ...
jvm.com.esri.arcgis.gae.desktop.DesktopPythonEnvironment.initialize() TypeError: 'JavaPackage' object is not callable ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found