Does pandera support validating enum.Enum or subclases of it ?
See original GitHub issueDiscussed in https://github.com/unionai-oss/pandera/discussions/907
<div type='discussions-op-text'>Originally posted by davidandreoletti August 8, 2022 Assuming a pedantic like class declaration with:
class SizeEnum(enum.Enum):
BIG = "big"
SMALL = "small"
class SummaryDFSchema(pandera...):
size : pandera.Series[SizeEnum]
name : ...
Currently pandera fails (via exception raised) because it seems pandas does not reconignize the Enum as a registered custom dtype.
What methods/workaround could be used to let pandera enforce/check the column contains SizeEnum types (rather than one of its string values such as “big”)?</div>
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Schema Models - pandera
Or you can use the SchemaModel() class directly to validate dataframes, which is syntactic sugar that simply delegates to the validate() method.
Read more >Schema Models - pandera
The enumeration PandasDtype is not directly supported because the type parameter of a typing.Generic cannot be an enumeration 1. Instead, you can use...
Read more >stable PDF - pandera
pandera provides a flexible and expressive API for performing data validation on dataframe-like objects to make data.
Read more >pandera.extensions - Read the Docs
import warnings from enum import Enum from functools import partial, ... ValueError( "Element-wise checks should support DataFrame and Series " "validation.
Read more >Niels Bantilan, Nigel Markey, Jean-Francois Zinque - pandera
If the dataframe does not pass validation checks, ... The enumeration PandasDtype is not directly supported because the type parameter of a ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
For string-type enums, I’ve found a fairly elegant solution is to pass them directly to
pandera.Fieldas theisin=argument. I don’t know if this would work for non-hashable python objects.This is a cool idea. It’s something I’ve added in for a specific use case of mine, though I admit all my
Enumsubclasses are strings so I’m not hitting on any obscure cases. Gotta be a better way to do this than what I hacked together, but I added a step toSchemaModel.__init_subclass__to update fields withEnumannotations to havecategoricaltypes with the categories defined by theEnumvalues.One argument for using
categoricaltype is that it can handle data types other than strings:This one is the first example in the enum docs:
That said, not sure what to do about
Enumsubclasses that have values that are not scalars:In this case, the expected values in the series would be (float, float) tuples that correspond to values of a
Planetvalue. Maybe that is ok.It seems an important question regards whether the expected series values are instances of the
Enum(i.e.Color.RED) or theEnumvalues (1). I would think the values, but thoughts on that?