ValueError: cannot handle a non-unique multi-index!

See original GitHub issue

dask_error.txt

This might be related to Dask not supporting multi-indexes. My code was randomly failing, which made me first assume there was a problem in the input data. Running with the versions

dask: 1.2.2 numpy: 1.16.3 pandas: 0.24.2

the minimal example below fails. Is there a way of making this error message more intuitive? Or having this operation working.

import numpy as np
import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'ind_a': np.arange(100), 'ind_b': 1, 'var': 'whatever'})
df = dd.from_pandas(df, npartitions=90)
    
# Only fails when grouping with two variables.
df['nr'] = df.groupby(['ind_a', 'ind_b']).cumcount()
len(df)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:23 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
mprorockcommented, Oct 21, 2021

definitely still seeing an issue. note as a workaround, I was able to specify a single column as a concat of the two values being grouped on and get things to move forward and around the issue

old code:

ddf['NEW_VALUE'] = ddf.groupby(['GROUP_KEY_1', 'GROUP_KEY_2'])['value'].cumsum()

new code to work around the issue:

ddf['GROUPER'] = ddf['GROUP_KEY_1'].astype(str) + ddf['GROUP_KEY_2'].astype(str)
ddf['NEW_VALUE'] = ddf.groupby(['GROUPER'])['value'].cumsum()
1reaction
marbericommented, Jul 9, 2020

@jakirkham

Creating a new environment with: dask: 2.20.0 numpy: 1.18.5 pandas: 1.0.5

I can confirm the error is still there. Testing this last year, we could localize where the problem came from, but without determining the best way to develop a fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

cannot handle a non-unique multi-index - Stack Overflow
The multi-index of your data frame has duplicate entries, which xarray cannot unstack into a multi-dimensional array -- the elements of the ...
Read more >
getting this error when using from_dataframe: cannot handle a ...
I am trying to convert a dataframe to an xarray. The functions I am using are df.to_xarray() and xr.DataSet.from_dataframe().
Read more >
cannot handle a non-unique multi-index-Pandas,Python
The multi-index of your data frame has duplicate entries, which xarray cannot unstack into a multi-dimensional array -- the elements of the hypothetical...
Read more >
Pandas - 'ValueError: cannot handle a non-unique multi-index!'
Pandas - 'ValueError: cannot handle a non-unique multi-index!' - Re-allocating values in pandas dataframe based on reference dictionary.
Read more >
Cannot handle a non-unique multi-index! Pandas Python
Pandas : pandas.concat: Cannot handle a non-unique multi-index ! Pandas Python [ Beautify Your Computer ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found