Pandas doesn't recognize Pyarrow as a Parquet engine even though it's installed
See original GitHub issueCode Sample, a copy-pastable example if possible
In [6]: pd.io.parquet.get_engine('auto')
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-6-77cb1d6c8933> in <module>
----> 1 pd.io.parquet.get_engine('auto')
~/miniconda3/lib/python3.6/site-packages/pandas/io/parquet.py in get_engine(engine)
30 pass
31
---> 32 raise ImportError("Unable to find a usable engine; "
33 "tried using: 'pyarrow', 'fastparquet'.\n"
34 "pyarrow or fastparquet is required for parquet "
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support
Problem description
Pandas doesn’t recognize Pyarrow as a Parquet engine even though it’s installed. Note that you can see that Pyarrow 0.12.0 is installed in the output of pd.show_versions() below.
Expected Output
In [2]: pd.io.parquet.get_engine('auto')
Out[2]: <pandas.io.parquet.PyArrowImpl at 0x119c78f28>
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-29-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.24.0 pytest: 3.9.3 pip: 18.1 setuptools: 40.5.0 Cython: None numpy: 1.15.4 scipy: 1.1.0 pyarrow: 0.12.0 xarray: None IPython: 7.1.1 sphinx: 1.8.2 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.0.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 4.2.5 bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:5
- Comments:16 (3 by maintainers)
Top Related StackOverflow Question
TLDR: I got it working by uninstalling via conda and installing with pip. So it appears that there’s something off about that specific conda version. Sorry for the noise.
Details below for others.
I didn’t have multiple versions of pyarrow installed.
I uninstalled via conda, verified I didn’t have pyarrow from pip, reinstalled via conda, and got the same error:
And then
I got it working by uninstalling via conda and installing with pip:
So it appears that there’s something off about that specific conda version.
see https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html#increased-minimum-versions-for-dependencies
you need pyarrow >= 0.13