Types and Constraints¶
Types in Python¶
Before we dive into utype, let’s review types and type annotation in Python
Note
If you are familiar with Python type annotation already, please read next chapter directly
In Python, each class ( class
) is a type, and when you use type()
on it's instance, you get the class itself, as shown in
class MyClass:
pass
my_inst = MyClass()
assert type(my_inst) == MyClass
assert isinstance(my_inst, MyClass)
Note
Python is a strongly typed dynamic type language. Strong typing means that each value has a unique type. every type changes of the value must go through explicit type cast. Dynamic typing means that runtime type guarantees are not provided at the language level. For example, functions can accept any type of input parameter
There are some built-in types in Python that we can reference directly without importing, like
int
: integer, such as1
float
: Floating-point numbers, such as3.14
bool
: Boolean value, such asTrue
orFalse
str
: string value, such as'text'
bytes
: Byte string, which stores data as a sequence of binary bytes and can be converted to and from str, such asb'binary'
list
: List, with variable length and elements, such as[1, 2]
tuple
: a tuple whose length is fixed and whose elements cannot be modified, such as(1, 'a')
set
: collection, with unique and unordered internal elements, such as{'b', 'a', 'c'}
dict
: dictionary that provides a mapping of keys to values, such as{'a': 1, 'b': 2}
Some standard libraries also provide common types that we can import directly without installing third-party dependencies, such as
datetime
Date and time library, which providesdatetime
,date
,time
,timedelta
and other types to represent date, time, and durationenum
Provides aEnum
type to represent an enumeration value (a value with a fixed range of values)uuid
Provides theUUID
type to represent a globally unique identifierdecimal
: Type is providedDecimal
to represent the number of decimal points In addition, there are many types provided by the standard library or third-party libraries, here is no need to enumerate all the discussions here
Type annotation syntax¶
Python >= 3.6 introduced a type annotation mechanism that can annotate the type of a variable, such as
name: str = 'test'
age: int = 1
When you add a type annotation to a variable, the IDE will warn you if you try to access a method or operator that is not supported by the type in your code, which can reduce unnecessary bugs during development.
However, it is important to note that Python itself does not provide guarantees for type annotations, which means that the actual value of a variable may not match the declared type, or may be changed by assignment, such as
age: int = 'nonsense'
age += 1 # TypeError: can only concatenate str (not "int") to str
Note
utype
is based on Python type annotation, and turns the annotation into a runtime type guarantee that can be relied on
Nested type¶
Python also supports the annotation of nested types, such as declaring a list type whose elements are strings.
from typing import Any, List, Dict
class Series:
names: List[str] = ['n1', 'n2']
values: List[float] = [0.1, 0.2]
metadata: Dict[str, Any] = {'version': '0.1.1'}
In the example, we can see that typing provided a set of type aliases that support nested operation
List
: declare a list, you need to pass in a type in square brackets that represents the type of its elements.Set
: declare a set, you need to pass in a type in square brackets that represents the type of its elements.Tuple
: declare a tuple. Multiple types can be passed in square brackets to represent the type of the element at the corresponding position in the tuple.Dict
: declare a dictionary, you need to pass two types in square brackets, representing the type of the key and the type of the value.
If you are using Python 3.9 +, you can directly use list
, set
instead of typing aliases to achieve the same effect, such as
from typing import Any
class Series:
names: list[str] = ['n1', 'n2']
values: list[float] = [0.1, 0.2]
metadata: dict[str, Any] = {'version': '0.1.1'}
Special annotations¶
Python also supports some specia type annotations, such as
Union
: useUnion[X, Y]
to indicates that the corresponding value is either of type X or type Y.Optional
: useOptional[X]
indicates that type X or None are allowed, which is a shortcut forUnion[X, None]
from typing import Union, Optional
class Form:
name: str = 'alice'
address: Optional[str] = None
phone_number: Union[str, int] = 12345
If you’re using Python 3.10 +, the Union type can be declared using the or ( |
) operator
class Form:
name: str = 'alice'
address: str | None = None
phone_number: str | int = 12345
Warning
Optional
is not incidating the "required" / "optional" property of a field, this feature is archieved by declaring the default
or required
params in Field configuration
Callable
: declares a callable object, often used to annotate function objects, such asCallable[[int], str]
representing a function that takes an integer as input to output a string.-
Type
: declare the type itself, for examplefrom typing import Type class MyClass: pass class Collection: int_type: Type[int] = int my_type: Type[MyClass] = MyClass
-
Literal
: used to declare a constant or a series of enumeration values, such asfrom typing import Literal class File: fmt: Literal['binary'] = 'binary' mode: Literal['r', 'rb', 'w', 'wb'] = 'rb' opening: Literal[1, True, 'true'] = True
Annotate functions¶
You can also use type annotation for function arguments and return value in the same way, such as
from typing import Dict, Optional
password_dict: Dict[str, str] = {}
# pretend this is a database that store user passwords
def login(username: str, password: str) -> Optioanl[Dict[str, str]]:
if password_dict.get(username) == password:
return {
'username': username,
}
return None
password_dict: dict[str, str] = {}
# pretend this is a database that stores user passwords
def login(username: str, password: str) -> dict[str, str] | None:
if password_dict.get(username) == password:
return {
'username': username,
}
return None
Note
Some special functions requires specific annotations, like generator functions or async generator functions, we will discuss it in Function parsing
ForwardRef string¶
Python supports the use of strings to reference types in the global namespace, which is also called ForwardRef, often used in the following cases
Referencing class itself inside class
aka "Self-reference", such as
class Comment:
content: str
on_comment: 'Comment' = None
comments: List['Comment']
Referencing to a type that has not been defined
This usage is common, such as circular references.
class Article:
title: str
comments: List['Comment']
class Comment:
content: str
on_article: Article = None
Referencing globals from tainted locals
If the name of the type you need to annotate is already occupied in the local namespace, you need to use a string reference to reference it in the global namespace.
from datetime import datetime
class Article:
str: str = 'placeholder' # name <str> is occupied
title: 'str'
datetime: datetime = datetime.now()
created_at: 'datetime'
Warning
In the above case, if title
attribute use title: str
as type annotation, the actual annotated type is a string 'placeholder'
in the locals, which has no meaning
Note that types that use string references must be defined in the global namespace ( globals()
), and you cannot use local variables in functions for reference hints, such as
def not_working():
class Article:
title: str
comments: List['Comment'] # will not work properly
class Comment:
content: str
on_article: Article = None # this will work
Further reading¶
The above documentation lists only the most commonly used syntax and types. If you want to know more about Python type annotation syntax, you can refer to the following documentation.
Constrainted type¶
In utype, all the validations are around types, utype not only supports parsing Python type annotations, but also supports constraints
The concept of a constraint is very simple. For example, if I need a positive number, or a string with a length between 10 and 20, then I can’t express it using primitive types int
str
, but it can be done simply using utype
from utype import Rule
class PositiveInt(int, Rule):
gt = 0
class MyStr(str, Rule):
min_length = 10
max_length = 20
utype supports declarative constraints. including
- Range constraints: constrain the maximum and minimum of the value (
gt
,ge
,lt
,le
) - Length constraints: constrain the length or length range of the value (
length
,max_length
,min_length
) - Constants and enumeration: constrain value to be a constant or in a fixed values range (
const
,enum
) - Regex constraints: constrain value to satisfy certain regular expression (
regex
) - Numeric constraints: constraint the maximum digit length of a numeric value, etc. (
max_digits
,decimal_places
) - Array constraints: constrain element uniqueness, contained types for list values (
unique_items
,contains
)
Note
For more elaborated constraint params, please refer to Rule API References
These built-in constraints can basically cover most use cases, In addition, utype also supports custom constraints and custom verification logic, which we will introduce later.
There are two ways to declare constraints in utype, which we describe below
Rule mixin¶
One of the most common ways is to declare a new class, using the source type and the Rule class as the base class, and declare the constraints you need in the attributes of this class, such as
from utype import Rule
class PositiveInt(int, Rule):
gt = 0
num = PositiveInt('3')
print(type(num))
# > <class 'int'>
- Source Type:
int
- Constrainted type:
PositiveInt
- Constraints:
gt=0
(value greater than zero) - Result type of calling:
int
(source type)
Here is an equation to help understanding:
Constrainted-Type = Source-Type + Rule + Constraint-Attributes
the result of calling a constrainted type is an instance of the source type that satisfy all the constraint validations (otherwise an error is throwed).
The source type does not need to be a base type, but can also be a custom type, such as
import utype
class MonthType(int):
def get_days(self, year: int) -> int:
# you will get 'year' of int type and satisfy those constraints
from calendar import monthrange
return monthrange(year, self)[1]
class Month(MonthType, utype.Rule):
gt = 0
le = 12
mon = Month(b'11')
assert isinstance(mon, MonthType)
print(mon.get_days(2020))
# > 30
@utype.apply
decorator¶
Another way to constrain type is to use a @utype.apply
decorator, declaring the constraints directly for the target type in decorator parameters, as shown in
import utype
@utype.apply(gt=0, le=12)
class Month(int):
@utype.parse
def get_days(self, year: int = utype.Field(ge=2000, le=3000)) -> int:
# you will get 'year' of int type and satisfy those constraints
from calendar import monthrange
return monthrange(year, self)[1]
mon = Month(b'11')
assert isinstance(mon, Month)
print(mon.get_days('2020'))
# > 30
Note
The essence of @utype.apply
is still an internal Rule mixin
isinstance detection
we can use isinstance(obj, t)
in Python to test whether the object obj is an instance of type t (including instances of subclasses of t), but for constrainted types, this behavior actually detects whether the object is an instance of the source type of the constraint type and satisfies the constraint validations, so
from utype import Rule
class PositiveInt(int, Rule):
gt = 0
print(isinstance(1, PositiveInt))
# > True
print(isinstance(-2, PositiveInt))
# > False
print(isinstance(b'3', PositiveInt))
# > False
Built-in constrainted type¶
utype has already declared some common constrainted types that you can import directly from utype.types
, such as
PositiveInt
: a positive number, excluding 0NaturalInt
: natural number, including 0Month
: number of months, 1 to 12Day
: number of days in a month, from 1 to 31Week
: number of weeks in a year, 1 to 53WeekDay
: day of the week, 1 to 7Quater
: quarter of the year, 1 to 4Hour
: hours of a day, 0 to 23Minute
: minutes of an hour, 0 to 59Second
: seconds of a minute, 0 to 59SlugStr
: String format commonly used for post URLs, consisting of lowercase letters, numbers, and hyphens-
EmailStr
: a string that meets the requirements of the email address format
In fact, the declaration of these types is very simple, and you can declare and implement them yourself. utype suggests declaring common constraints as constrainted types, so that you can directly refer to them in type annotation elsewhere.
Nested type¶
We can use nested type annotation such as List[int]
, but it cannot be directly used to convert input data.
utype provides some nested types, using the same syntax as typing, but can be directly used for validation and conversion, such as
import enum
from utype import types, exc
class EnumLevel(str, enum.Enum):
info = 'INFO'
warn = 'WARN'
error = 'ERROR'
level_array = types.Array[EnumLevel]
print(level_array(['INFO', 'WARN']))
# > [<EnumLevel.info: 'INFO'>, <EnumLevel.warn: 'WARN'>]
try:
level_array(['OTHER'])
except exc.ParseError as e:
print(e)
"""
ParseError: 'OTHER' is not a valid EnumLevel
"""
value = ('1', True, b'2.3')
print(types.Array[int](value))
# > [1, 1, 2]
in the example we declared a level_array
nested type, using Array
as the primitive type, where the nested element type is an Enum class, the nested type can be called directly and the data can be parsed and validated
The nested types currently supported by utype are
* types.Array
: Supports declaration of element types and constraints for list, tuple, set, and other sequence structures. The default source type is list
* types.Object
: Supports declaring element types and constraints for dictionaries, mappings. The default source type is dict
You can also inherit these nested types, assign constraints, specify other source types, etc. The usage is similar to Rule (because these nested types also inherit from Rule).
from utype import types
class UniqueTuple(types.Array):
__origin__ = tuple
unique_items = True
unique_tuple = UniqueTuple[int, int, str]
print(unique_tuple(['1', '2', 't']))
# > (1, 2, 't')
try:
unique_tuple(['1', '1', '3'])
except exc.ParseError as e:
print(e)
"""
ConstraintError: Constraint: <unique_items>: True violated: value is not unique
"""
In the example, we declared UniqueTuple
by inheriting the nested type Array
, we use __origin__
to override the default source type, so that it does not conflict with the default source type of the nested type; we also declared a unique_items=True
constraint, which indicates that the elements of the input data must be unique
Note
Currently, utype are not using Generic type for nested type, so IDE cannot hint properly for the inner elements, so utype plans to implement some IDE plugins in the further
Logical operations of type¶
utype supports logical operations on types using Python’s native logical operators for combining more complex type conditions, such as
from utype import Rule, exc
from typing import Literal
class IntWeekDay(int, Rule):
gt = 0
le = 7
weekday = IntWeekDay ^ Literal['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun']
assert weekday('6') == 6
assert weekday(b'tue') == 'tue'
try:
weekday('8')
except exc.ParseError as e:
print(e)
"""
Constraint: <le>: 7 violated;
Constraint: <enum>: ('mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun') violated
"""
from datetime import date
weekday_or_date = weekday | date
assert weekday_or_date(b'5') == 5
assert weekday_or_date('fri') == 'fri'
assert weekday_or_date('2000-1-1') == date(2000, 1, 1)
In our example, we declared weekday
that concatenates IntWeekDay
with a Literal
enumeration list using the exclusive-or operator, which means that weekday
requires input values to match one of the two types. The logical operators supported by utype are
- Or (
|
): Data needs to match at least one of these types - Xor (
^
): The data must match one of the criteria, not more than one or zero - Not (
~
): Data must not match the corresponding type - And (
&
): Data must match all types simultaneously
Note
utype can support logical nesting at any level, so technically you can use this syntax to declare any logical conditions, but in practice it is not recommended to use overly complex types, which will make development and debugging difficult
Common use case: Reverse Selection and Exclusion
A common use case of logical combination is that a certain type needs to be used, but some values need to be excluded. For example, 0 needs to be excluded as a dividend for float
. In this case, we can first declare the value to be excluded with a constraint, and then negate it and combine it with the source type to get the excluded type we need.
from utype import Rule, exc
class Zero(Rule):
const = 0
Divisor = float & ~Zero
try:
Divisor('0')
except exc.ParseError as e:
print(e)
"""
Negate condition: Zero(const=0) is violated
"""
Using and ( &
) logic will convert the conditions in order. In the example, we declared Divisor
, which is a float type excluding 0. When parsing, we will first convert the input data to float, and then to ~Zero
, which fails if value match Zero, otherwise the parsing is done
You can use enum
constraint to specify excluded enumeration values, such as
from utype import Rule, exc
class Infinity(Rule):
enum = [float("inf"), float("-inf")]
FiniteFloat = float & ~Infinity
assert FiniteFloat(b'3.3') == 3.3
try:
FiniteFloat('inf')
except exc.ParseError as e:
print(e)
"""
Negate condition: Infinity(enum=[inf, -inf]) is violated
"""
Restrictions and version compat¶
To use the logical type functionality provided by utype, at least one of the types participating in the logical operation must be a constrainted type of utype, because Python has limited native support for type logical operations.
Python < 3.10
: does not support any type of logical operation. Using syntax likestr | int
will result in an error.Python >= 3.10
: Supports the use of the or (|
) operator on types to obtain Union types, which can be used for type annotations but cannot be used for type conversions, and does not support the use of other operators, such as Xor (^
) or not (~
).
However, as long as the constrainted type of utype is used, all logical operator operations can be supported in Python >= 3.7
, and logical type can not only be used for type annotation, but also be used to parse data
To make life easier, utype.types
already provide some types corresponding the the primitive types, to be used in logical operations, such as Int
, Str
, Bool
, Float
etc. you can treat them as "primitive types that support logical operations"
from utype.types import Int
any_of1 = Int | bool | str # ok
any_of2 = bool | Int | str # ok
xor_type1 = Int ^ bool ^ str # ok
xor_type2 = bool ^ Int ^ str # ok
xor_type3 = bool ^ str ^ Int # TypeError: unsupported operand type(s) for ^
revert_type = ~int # TypeError: bad operand type for unary ~: 'type'
revert_type = ~Int # ok
print(revert_type | xor_type2)
# > AnyOf(Not(Int(int)), OneOf(bool, Int(int), str))
Warning
In the logic operation expression, the constrainted type must be at least the first or second element, otherwise the first element and the second element will be combined first, which will throw an error directly if such operation is not supported by Python
It should be noted that only logical types that use utype constrainted type can provide the conversion functionalities, you cannot use Python native types directly to parse values after logical operation.
try:
(str | int)('some value')
except TypeError as e:
print(e)
# Python >= 3.10
"""
Cannot instantiate typing.Union
"""
# Python < 3.10
"""
unsupported operand type(s) for |: 'type' and 'type'
"""
Tune type transformation¶
Note
This is an advanced section for the developer who has mastered utype and seeking for customization, if you just get started, you can read the next document directly
Python does not provide a safe and effective way to convert any type value to any other type natively . All type conversion logic is provided by utype through conversion functions. However, each developer may have his/her own preference for the strictness of type conversion method. Therefore, utype’s type conversion mechanism provides multiple adjustable preference parameters, and supports dynamic registration, which we will mainly introduce in this section.
Register type transformer¶
In utype, each type is converted through type transformer functions, but these functions are not fixed, and can be registered flexibly, such as
from utype import Rule, Schema, register_transformer
from typing import Type
class Slug(str, Rule):
regex = r"[a-z0-9]+(?:-[a-z0-9]+)*"
@register_transformer(Slug)
def to_slug(transformer, value, t: Type[Slug]):
str_value = transformer(value, str)
return t('-'.join([''.join(
filter(str.isalnum, v)) for v in str_value.split()]).lower())
class ArticleSchema(Schema):
slug: Slug
print(dict(ArticleSchema(slug=b'My Awesome Article!')))
# > {'slug': 'my-awesome-article'}
The registered transformer function does not affect the behavior of the method of the class __init__
, In the following scenario, the conversion function is called
- Parsing the source type of a constrainted type
- Parsing a field of dataclass with certain type
- Parsing a param of function with certion type
When parsing a field/parameter, if the type of the data is exactly the same as the declared type ( type(data) == t
), the parsing will be skipped directly, otherwise, the transformer function will be resolved for calling
Note
if the transformer function cannot be found for certain type, that type is called an unresolved type, you can tune it's behaviour by the param unresolved_types
of Options - tuning parse
The parameters to register the decorator @register_transformer
are as follows
*classes
pass the type classes to be registeredallow_subclasses
: whether to accept subclasses. The default is True. If True,*classes
subclasses in will also apply the same transformer function if they are not registered.metaclass
: pass in a metaclass to specify the transformer function for all instance types of the metaclass.attr
: pass in an attribute name specifies the transformer function for all classes that have that attribute.detector
: pass in a detection function, and the transformer function is specified for all classes satisfying the detection.priority
: specify the priority of the conversion function. The higher it is, the higher it is used. By default, the later it is registered, the higher its priority is.to
: You can specify theTypeTransformer
subclass where the converter is registered . By default, the converter you register is global. Specifying aTypeTransformer
subclass will only register for this class. You can declare a transformer class of type in Options.
By default, the later the converter is registered, the higher the priority, so it can achieve the effect of “override”.
Warning
You cannot register for nested types, such as Union[int, str]
or List[int]
Compat other libraries¶
With the registration capability of type transformer, utype can be compatible with other class libraries by registering transformer functions.
Compat pydantic
¶
pydantic
is a data parsing and validation library, which also has the ability of type conversion and constraint validation. it's BaseModel is similar to the dataclass (Schema/DataClass) in utype.
Transformer function to compat pydantic
is as follows
from utype import register_transformer
from collections.abc import Mapping
from pydantic import BaseModel
@register_transformer(BaseModel)
def transform_pydantic(transformer, data, cls):
if not transformer.no_explicit_cast and not isinstance(data, Mapping):
data = transformer(data, dict)
return cls(**data)
Compat attrs
¶
attrs
is a library that simplifies class declarations, allowing you to map initialization parameters to attributes without declaring __init__
functions, but does not have the ability to convert type and validate constraints
Transformer function to compat attrs
is as follows
from utype import register_transformer
from collections.abc import Mapping
@register_transformer(attr='__attrs_attrs__')
def transform_attrs(transformer, data, cls):
if not transformer.no_explicit_cast and not isinstance(data, Mapping):
data = transformer(data, dict)
names = [v.name for v in cls.__attrs_attrs__]
data = {k: v for k, v in data.items() if k in names}
return cls(**data)
Compat dataclasses
¶
dataclasses
is a standard library for Python, similar to attrs
, which also provides convenient methods for initializing classes, but also without the ability for type conversion
Transformer function to compat dataclasses
is as follows
from utype import register_transformer
from collections.abc import Mapping
@register_transformer(attr='__dataclass_fields__')
def transform_dataclass(transformer, data, cls):
if not transformer.no_explicit_cast and not isinstance(data, Mapping):
data = transformer(data, dict)
data = {k: v for k, v in data.items() if k in cls.__dataclass_fields__}
return cls(**data)