Can't index into anndata object with integer obs_names
See original GitHub issueTrying to index into an AnnData object which has integer obs_names throws an assertion error. I expected either not allowing the construction an object with integer obs_names or allowing indexing into them.
Here’s a quick example. First I instantiate an AnnData object, give it integers for observation names, then get an error when I try to index into it:
>>> import scanpy.api as sc
>>> import pandas as pd
>>> import numpy as np
>>> adata = sc.datasets.krumsiek11()
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
... storing 'cell_type' as categorical
>>> adata.obs_names = np.array(range(adata.n_obs))
>>> adata[:, ['Gata2', 'Gata1']]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1211, in __getitem__
return self._getitem_view(index)
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1214, in _getitem_view
oidx, vidx = self._normalize_indices(index)
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1190, in _normalize_indices
obs = _normalize_index(obs, self.obs_names)
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 231, in _normalize_index
'Don’t call _normalize_index with non-categorical/string names'
AssertionError: Don’t call _normalize_index with non-categorical/string names
This error in indexing can be recovered by using a pandas.RangeIndex for the observation names:
>>> adata.obs_names = pd.RangeIndex(stop=adata.n_obs)
>>> adata[:, ['Gata2', 'Gata1']]
View of AnnData object with n_obs × n_vars = 640 × 2
obs: 'cell_type'
uns: 'iroot', 'highlights'
However, range indexes are frequently implicitly replaced with integer indexes:
>>> adata_norm = sc.pp.normalize_per_cell(adata, copy=True)
>>> adata_norm.obs_names
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
630, 631, 632, 633, 634, 635, 636, 637, 638, 639],
dtype='int64', length=640)
>>> adata_norm[:, ['Gata2', 'Gata1']]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1211, in __getitem__
return self._getitem_view(index)
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1214, in _getitem_view
oidx, vidx = self._normalize_indices(index)
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1190, in _normalize_indices
obs = _normalize_index(obs, self.obs_names)
File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 231, in _normalize_index
'Don’t call _normalize_index with non-categorical/string names'
AssertionError: Don’t call _normalize_index with non-categorical/string names
Thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (9 by maintainers)
Top Results From Across the Web
Can't index into anndata object with integer obs_names - GitHub
Trying to index into an AnnData object which has integer obs_names throws an assertion error. I expected either not allowing the ...
Read more >anndata 0.9.0.dev37+g312e6ff documentation - Read the Docs
Indexing into an AnnData object can be performed by relative position with numeric indices (like pandas' iloc() ), or by labels (like loc()...
Read more >Getting started with anndata - Read the Docs
In this tutorial, we introduce basic properties of the central object, AnnData (“Annotated Data”). AnnData is specifically designed for matrix-like data.
Read more >anndata.AnnData.obs_names_make_unique - Read the Docs
Makes the index unique by appending a number string to each duplicate index ... already exists in the index, it tries the next...
Read more >anndata.experimental.AnnCollection - Read the Docs
Lazily concatenate AnnData objects along the obs axis. This class doesn't copy data from underlying AnnData objects, but lazily subsets using a joint...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sorry about the late response: yes, your
.iloc/.locanalogy is correct. You’ll find a similar behavior to anndata (without.iloc/.loc) also in numpy structured arrays. I’ll add your analogy to the docs.Thank you!
And, yes, we should also throw an error for the integer indexing. Leaving this open for now.
OK! so throwing an error here was a bugfix itself:
https://github.com/theislab/anndata/compare/e83f9dbe0bb0255eb34a8276c2f559434c5cc763...6a6f13ebc9a3db0a3aafec50f25c03943709ff84
I assume that
_normalize_indexis just not called correctly when there’s integer indices. The fix is therefore surely more involved than just removing that line or converting it into a warning:I’m pretty sure that integer indices on anndata with integer names were already broken before that and I just added a nicer error message.