TypeError: sum() got an unexpected keyword argument 'skipna'

See original GitHub issue

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(index=np.arange(10), columns=np.arange(5), dtype=float)

df = 
    0   1   2   3   4
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN

df.groupby(pd.Series(['a', 'a', 'b', 'b', 'b']), axis=1).agg('sum', skipna=True)

Problem description

The above call to agg gives

KeyError: 'a'

This is, because here: https://github.com/pandas-dev/pandas/blob/67ee16a283b826b4adb07e5ca64a5206d242acaf/pandas/core/groupby/groupby.py#L1376 we are trying to access a new column name ('a') in the original DataFrame.

It only occurs, when no _cython_agg_general is possible, e.g., when keyword argument skipna is given to agg. Without skipna argument the expected output below will be produced.

Expected Output

df = 
     a    b
0  0.0  0.0
1  0.0  0.0
2  0.0  0.0
3  0.0  0.0
4  0.0  0.0
5  0.0  0.0
6  0.0  0.0
7  0.0  0.0
8  0.0  0.0
9  0.0  0.0

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.8.final.0
python-bits      : 64
OS               : Windows
OS-release       : 7
machine          : AMD64
processor        : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : en
LOCALE           : None.None

pandas           : 0.25.1
numpy            : 1.16.4
pytz             : 2018.9
dateutil         : 2.8.0
pip              : 19.2.3
setuptools       : 40.8.0
Cython           : 0.29.7
pytest           : 4.3.1
hypothesis       : None
sphinx           : 2.1.2
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10
IPython          : 7.8.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.1
numexpr          : 2.6.9
odfpy            : None
openpyxl         : 2.6.1
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : 1.3.1
tables           : 3.5.2
xarray           : None
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
the-moose-machinecommented, Jun 8, 2020

I confirm the same. All NaN values are picked up as 0. This is useless when manipulating data for academic research. For instance,

>>> a = pd.DataFrame([np.NaN, np.NaN])
>>> a.sum()
0    0.0

This results in 0

Meanwhile, when doing the same with skipna=False:

>>> a.sum(skipna=False)
0   NaN

This results in NaN which is the desired output when calculating means. However when attempting the same within the groupby function:

>>> a[2] = ['a','a']
>>> a.groupby(2).agg({0:sum})
     0
2
a  0.0

the sum always returns 0 and there is no option of skipping NaN values.

These 0 values skew the means and standard deviations resulting in wrong figures. When working with a huge amount of data we realised that the results of our study did not make sense, On further investigation I discovered this bug within pandas. I fear that several others may have unknowingly reported inaccurate figures when manipulating data with pandas data frames. So this bug is very much relevant.

1reaction
jorisvandenbosschecommented, Nov 20, 2020

The underlying issue here is that the skipna keyword is at the moment not yet implemented for groupby reductions like groupby(..).sum().

It might be that before this keyword was ignored and recently started to raise, but it never actually worked (or was never documented).

The improvement to add skipna to the grouped reductions is covered in https://github.com/pandas-dev/pandas/issues/15675, so going to close this as a duplicate of #15675

Read more comments on GitHub >

github_iconTop Results From Across the Web

python 3.x - sum() got an unexpected keyword argument 'axis'
The method sum that you can use with GroupBy doesn't have the parameter axis : GroupBy.sum(numeric_only=True, min_count=0).
Read more >
pandas.DataFrame.sum — pandas 0.25.0 documentation
This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1....
Read more >
numpy replace nan with mean
results in TypeError: astype() got an unexpected keyword argument 'skipna'. Problem description. The provided snippet used to work before the 1.0 release.
Read more >
API - Dask documentation
Get Subtraction of dataframe and other, element-wise (binary operator sub ). DataFrame.sum ([axis, skipna, split_every, ...]) Return the sum of the ...
Read more >
sum() got an unexpected keyword argument 'axis'_Mr.Jcak的 ...
这是因为直接对torch.tensor进行np.sum操作造成的,只要首先将需要求和 ... np.sum报错TypeError: sum() got an unexpected keyword argument 'axis'.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found