Can't index into anndata object with integer obs_names

See original GitHub issue

Trying to index into an AnnData object which has integer obs_names throws an assertion error. I expected either not allowing the construction an object with integer obs_names or allowing indexing into them.

Here’s a quick example. First I instantiate an AnnData object, give it integers for observation names, then get an error when I try to index into it:

>>> import scanpy.api as sc
>>> import pandas as pd
>>> import numpy as np
>>> adata = sc.datasets.krumsiek11()
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
... storing 'cell_type' as categorical
>>> adata.obs_names = np.array(range(adata.n_obs))
>>> adata[:, ['Gata2', 'Gata1']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1211, in __getitem__
    return self._getitem_view(index)
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1214, in _getitem_view
    oidx, vidx = self._normalize_indices(index)
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1190, in _normalize_indices
    obs = _normalize_index(obs, self.obs_names)
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 231, in _normalize_index
    'Don’t call _normalize_index with non-categorical/string names'
AssertionError: Don’t call _normalize_index with non-categorical/string names

This error in indexing can be recovered by using a pandas.RangeIndex for the observation names:

>>> adata.obs_names = pd.RangeIndex(stop=adata.n_obs)
>>> adata[:, ['Gata2', 'Gata1']]
View of AnnData object with n_obs × n_vars = 640 × 2 
    obs: 'cell_type'
    uns: 'iroot', 'highlights'

However, range indexes are frequently implicitly replaced with integer indexes:

>>> adata_norm = sc.pp.normalize_per_cell(adata, copy=True)
>>> adata_norm.obs_names
Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            630, 631, 632, 633, 634, 635, 636, 637, 638, 639],
           dtype='int64', length=640)
>>> adata_norm[:, ['Gata2', 'Gata1']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1211, in __getitem__
    return self._getitem_view(index)
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1214, in _getitem_view
    oidx, vidx = self._normalize_indices(index)
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 1190, in _normalize_indices
    obs = _normalize_index(obs, self.obs_names)
  File "/usr/local/lib/python3.6/site-packages/anndata/base.py", line 231, in _normalize_index
    'Don’t call _normalize_index with non-categorical/string names'
AssertionError: Don’t call _normalize_index with non-categorical/string names

Thanks!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
falexwolfcommented, Jul 2, 2018

Sorry about the late response: yes, your .iloc/.loc analogy is correct. You’ll find a similar behavior to anndata (without .iloc/.loc) also in numpy structured arrays. I’ll add your analogy to the docs.

Thank you!

And, yes, we should also throw an error for the integer indexing. Leaving this open for now.

0reactions
flying-sheepcommented, Jan 8, 2019

OK! so throwing an error here was a bugfix itself:

https://github.com/theislab/anndata/compare/e83f9dbe0bb0255eb34a8276c2f559434c5cc763...6a6f13ebc9a3db0a3aafec50f25c03943709ff84

I assume that _normalize_index is just not called correctly when there’s integer indices. The fix is therefore surely more involved than just removing that line or converting it into a warning:

I’m pretty sure that integer indices on anndata with integer names were already broken before that and I just added a nicer error message.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Can't index into anndata object with integer obs_names - GitHub
Trying to index into an AnnData object which has integer obs_names throws an assertion error. I expected either not allowing the ...
Read more >
anndata 0.9.0.dev37+g312e6ff documentation - Read the Docs
Indexing into an AnnData object can be performed by relative position with numeric indices (like pandas' iloc() ), or by labels (like loc()...
Read more >
Getting started with anndata - Read the Docs
In this tutorial, we introduce basic properties of the central object, AnnData (“Annotated Data”). AnnData is specifically designed for matrix-like data.
Read more >
anndata.AnnData.obs_names_make_unique - Read the Docs
Makes the index unique by appending a number string to each duplicate index ... already exists in the index, it tries the next...
Read more >
anndata.experimental.AnnCollection - Read the Docs
Lazily concatenate AnnData objects along the obs axis. This class doesn't copy data from underlying AnnData objects, but lazily subsets using a joint...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found