TypeError: sum() got an unexpected keyword argument 'skipna'

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(index=np.arange(10), columns=np.arange(5), dtype=float)

df = 
    0   1   2   3   4
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN

df.groupby(pd.Series(['a', 'a', 'b', 'b', 'b']), axis=1).agg('sum', skipna=True)

Problem description

The above call to agg gives

KeyError: 'a'

This is, because here: https://github.com/pandas-dev/pandas/blob/67ee16a283b826b4adb07e5ca64a5206d242acaf/pandas/core/groupby/groupby.py#L1376 we are trying to access a new column name ('a') in the original DataFrame.

It only occurs, when no _cython_agg_general is possible, e.g., when keyword argument skipna is given to agg. Without skipna argument the expected output below will be produced.

Expected Output

df = 
     a    b
0  0.0  0.0
1  0.0  0.0
2  0.0  0.0
3  0.0  0.0
4  0.0  0.0
5  0.0  0.0
6  0.0  0.0
7  0.0  0.0
8  0.0  0.0
9  0.0  0.0

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.8.final.0
python-bits      : 64
OS               : Windows
OS-release       : 7
machine          : AMD64
processor        : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : en
LOCALE           : None.None

pandas           : 0.25.1
numpy            : 1.16.4
pytz             : 2018.9
dateutil         : 2.8.0
pip              : 19.2.3
setuptools       : 40.8.0
Cython           : 0.29.7
pytest           : 4.3.1
hypothesis       : None
sphinx           : 2.1.2
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10
IPython          : 7.8.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.1
numexpr          : 2.6.9
odfpy            : None
openpyxl         : 2.6.1
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : 1.3.1
tables           : 3.5.2
xarray           : None
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : None

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:14 (6 by maintainers)

Top GitHub Comments

5reactions

the-moose-machinecommented, Jun 8, 2020

I confirm the same. All NaN values are picked up as 0. This is useless when manipulating data for academic research. For instance,

>>> a = pd.DataFrame([np.NaN, np.NaN])
>>> a.sum()
0    0.0

This results in 0

Meanwhile, when doing the same with skipna=False:

>>> a.sum(skipna=False)
0   NaN

This results in NaN which is the desired output when calculating means. However when attempting the same within the groupby function:

>>> a[2] = ['a','a']
>>> a.groupby(2).agg({0:sum})
     0
2
a  0.0

the sum always returns 0 and there is no option of skipping NaN values.

These 0 values skew the means and standard deviations resulting in wrong figures. When working with a huge amount of data we realised that the results of our study did not make sense, On further investigation I discovered this bug within pandas. I fear that several others may have unknowingly reported inaccurate figures when manipulating data with pandas data frames. So this bug is very much relevant.

1reaction

jorisvandenbosschecommented, Nov 20, 2020

The underlying issue here is that the skipna keyword is at the moment not yet implemented for groupby reductions like groupby(..).sum().

It might be that before this keyword was ignored and recently started to raise, but it never actually worked (or was never documented).

The improvement to add skipna to the grouped reductions is covered in https://github.com/pandas-dev/pandas/issues/15675, so going to close this as a duplicate of #15675

Top Results From Across the Web

python 3.x - sum() got an unexpected keyword argument 'axis'

The method sum that you can use with GroupBy doesn't have the parameter axis : GroupBy.sum(numeric_only=True, min_count=0).

pandas.DataFrame.sum — pandas 0.25.0 documentation

This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1....

numpy replace nan with mean

results in TypeError: astype() got an unexpected keyword argument 'skipna'. Problem description. The provided snippet used to work before the 1.0 release.

API - Dask documentation

Get Subtraction of dataframe and other, element-wise (binary operator sub ). DataFrame.sum ([axis, skipna, split_every, ...]) Return the sum of the ...

sum() got an unexpected keyword argument 'axis'_Mr.Jcak的 ...

这是因为直接对torch.tensor进行np.sum操作造成的，只要首先将需要求和 ... np.sum报错TypeError: sum() got an unexpected keyword argument 'axis'.