TypeError: sum() got an unexpected keyword argument 'skipna'
See original GitHub issueCode Sample, a copy-pastable example if possible
df = pd.DataFrame(index=np.arange(10), columns=np.arange(5), dtype=float)
df =
0 1 2 3 4
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN
df.groupby(pd.Series(['a', 'a', 'b', 'b', 'b']), axis=1).agg('sum', skipna=True)
Problem description
The above call to agg gives
KeyError: 'a'
This is, because here:
https://github.com/pandas-dev/pandas/blob/67ee16a283b826b4adb07e5ca64a5206d242acaf/pandas/core/groupby/groupby.py#L1376
we are trying to access a new column name ('a') in the original DataFrame.
It only occurs, when no _cython_agg_general is possible, e.g., when keyword argument skipna is given to agg. Without skipna argument the expected output below will be produced.
Expected Output
df =
a b
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
4 0.0 0.0
5 0.0 0.0
6 0.0 0.0
7 0.0 0.0
8 0.0 0.0
9 0.0 0.0
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.4
pytz : 2018.9
dateutil : 2.8.0
pip : 19.2.3
setuptools : 40.8.0
Cython : 0.29.7
pytest : 4.3.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.1
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:14 (6 by maintainers)
Top Results From Across the Web
python 3.x - sum() got an unexpected keyword argument 'axis'
The method sum that you can use with GroupBy doesn't have the parameter axis : GroupBy.sum(numeric_only=True, min_count=0).
Read more >pandas.DataFrame.sum — pandas 0.25.0 documentation
This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1....
Read more >numpy replace nan with mean
results in TypeError: astype() got an unexpected keyword argument 'skipna'. Problem description. The provided snippet used to work before the 1.0 release.
Read more >API - Dask documentation
Get Subtraction of dataframe and other, element-wise (binary operator sub ). DataFrame.sum ([axis, skipna, split_every, ...]) Return the sum of the ...
Read more >sum() got an unexpected keyword argument 'axis'_Mr.Jcak的 ...
这是因为直接对torch.tensor进行np.sum操作造成的,只要首先将需要求和 ... np.sum报错TypeError: sum() got an unexpected keyword argument 'axis'.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I confirm the same. All NaN values are picked up as 0. This is useless when manipulating data for academic research. For instance,
This results in 0
Meanwhile, when doing the same with skipna=False:
This results in NaN which is the desired output when calculating means. However when attempting the same within the groupby function:
the sum always returns 0 and there is no option of skipping NaN values.
These 0 values skew the means and standard deviations resulting in wrong figures. When working with a huge amount of data we realised that the results of our study did not make sense, On further investigation I discovered this bug within pandas. I fear that several others may have unknowingly reported inaccurate figures when manipulating data with pandas data frames. So this bug is very much relevant.
The underlying issue here is that the
skipnakeyword is at the moment not yet implemented forgroupbyreductions likegroupby(..).sum().It might be that before this keyword was ignored and recently started to raise, but it never actually worked (or was never documented).
The improvement to add
skipnato the grouped reductions is covered in https://github.com/pandas-dev/pandas/issues/15675, so going to close this as a duplicate of #15675