Options - tuning parse¶
In utype, Options can be used to tune the parsing behavior of dataclasses and functions. In this document, we will explain its usage in detail.
Type transform options¶
Type conversion (transformation) is the most critical part of data parsing, and Options provides some options to control the behavior of type conversion.
Transforming preferences¶
no_explicit_cast
: no explicit type conversion, default is False
The meaning of no explicit type conversion is to try not to have unexpected type conversion, and the implementation will group the types according to primitive types.
null
: Noneboolean
: 0, 1, True, Falsenumber
: int/float/decimal etc.string
: str/bytes/by tearray and binary bytesarray
: list/tuple/setobject
: dict/mapping
When no_explicit_cast=True
, types in the same group can be converted to each other, and types in different groups cannot be converted to each other. However, there are some special cases. For example, Decimal
(fixed-point number) allows conversion from str
, because conversion from float
number will be distorted; Types such as datetime also support conversion from date strings and timestamps because there is no more native type expression
As an example, utype allows string to list/dictionary conversions by default, provided certain patterns are met, such as
from utype import type_transform
print(type_transform('[1,2,3]', list))
# > [1, 2, 3]
print(type_transform('{"value": true}', dict))
# > {'value': True}
However, such conversion is not allowed when no_explicit_cast=True
from utype import type_transform, Options
try:
type_transform('[1,2,3]', list, options=Options(no_explicit_cast=True))
except TypeError:
pass
try:
type_transform('{"value": true}', dict, options=Options(no_explicit_cast=True))
except TypeError:
pass
print(type_transform((1, 2), list, options=Options(no_explicit_cast=True)))
# > [1, 2]
no_data_loss
: disallow information loss during transformation, False by default
By default, we allow for the loss of information in type conversions, such as
from utype import type_transform
print(type_transform("Some Value", bool))
# > True
print(type_transform(3.1415, int))
# > 3
from datetime import date
print(type_transform('2022-03-04 10:11:12', date))
# 2022-03-04
In these examples, the information of the input data is irreversibly compressed or lost during type conversion. If no_data_loss=True
, these conversions with information loss will cause an error
from utype import type_transform, Options
try:
type_transform(3.1415, int, options=Options(no_data_loss=True))
except TypeError:
pass
Only accept conversions without information loss, such as
bool
: AcceptsTrue
only,False
,0
,1
and some strings that explicitly represent Boolean values, such as'true'
,'f'
'no'
, etc.int
: does not acceptfloat
orDecimal
with significant decimal placesdate
: does not accept conversions fromdatetime
or strings containing hour, minute, and second parts
Note
These above preferences is only the "flag" for transformer functions, Python itself does not have such mechanism to guarantee these conditions, instead they are implemented in the corresponding transformer functions, if you define or override a type transformer function, you should implement these preferences by youself
Unknown types¶
If a type cannot find a matching converter in utype (including converters registered by the developer itself), it is called an unknown type. For the conversion of an unknown type (which does not match the input data), utype provides a configuration parameter in Options.
unresolved_types
: specifies the behavior for handling unknown types. It takes several values'ignore'
: ignore, no longer convert, but directly use the input value as the result'init'
: attempt to initialize an unknown type witht(data)
'throw'
: throw an error directly and do not convert any more. This option is the default.
from utype import Schema, Options
class MyClass:
def __init__(self, value):
self.value = value
class MySchema(Schema):
__options__ = Options(
unresolved_types='init',
)
inst: MyClass = None
data = MySchema(inst=3)
print(data.inst.value)
# > 3
Data processing options¶
Options provides some parameters for functions and dataclasses to tune the entire input data, including
-
addition
: controls parameters beyond the declared scope, several options can be specifiedNone
: default option, directly ignored, without receiving and processingTrue
: accept additional parameters as part of dataFalse
: disallow additional parameters. If input contains extra parameters, an error is thrown.<type>
: specify a type to which the value of the extra parameter needs to be converted
Here is an example usage of addition
.
from utype import Schema, Options, exc
class User(Schema):
name: str
level: int = 0
data = {'name': 'Test', 'code': 'XYZ'}
print(dict(User.__from__(data))) # default: addition=None
# > {'name': 'Test', 'level': 0}
user = User.__from__(data, options=Options(addition=True))
print(dict(user))
# > {'name': 'Test', 'level': 0, 'code': 'XYZ'}
try:
User.__from__(data, options=Options(addition=False))
except exc.ParseError as e:
print(e)
"""
parse item: ['code'] exceeded
"""
Note
For function, you can accept additional params by declaring **kwargs
, so unless you need to ban additional params with addition=False
, there is no need to declare addition
in function Options
max_depth
: limits the maximum depth of data nesting. This parameter is primarily used to limit self-referencing or circularly referenced data structures to avoid recursive stack overflows
from utype import Schema, Options, exc
class Comment(Schema):
__options__ = Options(max_depth=3)
content: str
comment: 'Comment' = None
comment = {'content': 'stuck'}
comment['comment'] = comment
try:
Comment(**comment)
except exc.ParseError as e:
print(e)
"""
parse item: ['comment'] failed:
parse item: ['comment'] failed:
parse item: ['comment'] failed: max_depth: 3 exceed: 4
"""
In the example, we construct a self-referencing dictionary. If we keep parsing according to the data class declaration, we will continue parsing until Python throws a recursive error. We can control the maximum depth of parsing by limiting max_depth
.
In addition, Options provides a limit adjustment that controls the number of input parameters.
max_params
: set the maximum number of parameters passed inmin_params
: set the minimum number of parameters passed in
These two options are often used when enabled addition=True
to control the number of input parameters before parsing, so as to avoid consuming parsing resources due to too large input data.
from utype import Schema, Options, exc
class Info(Schema):
__options__ = Options(
min_params=2,
max_params=5,
addition=True
)
version: str
data = {
'version': 'v1',
'k1': 1,
'k2': 2,
'k3': 3
}
print(len(Info(**data)))
# > 4
try:
Info(version='v1')
except exc.ParamsLackError as e:
print(e)
"""
min params num: 2 lacked: 1
"""
try:
Info(**data, k4=4, k5=5)
except exc.ParamsExceedError as e:
print(e)
"""
max params num: 5 exceed: 6
"""
As you can see, when the number of input parameters is less than min_params
, the exc.ParamsLackError
is thrown, and when the number of input parameters is greater than max_params
, the exc.ParamsExceedError
is thrown
Distinction from length constraints
Although you can also constrain the length of the dictionary using min_length
and max_length
from Rule, they are functionally different from max_params
/ min_params
:
max_params
/ min_params
is the validation of the input data before all field parsing begins, in order to avoid consuming parsing resources because the input data is too large. And max_length
/ min_length
in the dataclass, is used to limit the length of the output data after all fields are parsed
And max_params
/ min_params
can be used to restrict the input of function arguments, while max_length
/ min_length
can only restrict normal types and dataclasses
Error handling¶
Options provides a series of error handling options to control the behavior of parsing errors, including
collect_errors
: whether to collect all errors. The default is False.
When utype parses the parameters of dataclasses and functions, if it finds a invalid data (unable to complete type conversion or satisfy constraints) when collect_errors=False
, it will directly throw the error as exc.ParseError
, which is a “fail-fast” strategy.
But when collect_errors=True
, utype will continue to parse and collect all the errors encountered. When the input data is done parsing, these errors will be combined to a exc.CollectedParseError
and thrown out. All the input data error information can be obtained from this combined error.
from utype import Schema, Options, Field, exc
class LoginForm(Schema):
__options__ = Options(
addition=False,
collect_errors=True
)
username: str = Field(regex='[0-9a-zA-Z]{3,20}')
password: str = Field(min_length=6, max_length=20)
form = {
'username': '@attacker',
'password': '12345',
'token': 'XXX'
}
try:
LoginForm(**form)
except exc.CollectedParseError as e:
print(e)
"""
parse item: ['username'] failed: Constraint: <regex>: '[0-9a-zA-Z]{3,20}' violated;
parse item: ['password'] failed: Constraint: <min_length>: 6 violated;
parse item: ['token'] exceeded
"""
print(len(e.errors))
# > 3
Note
Of course, it will be a slightly cost for invalid inputs when collect_errors=True
, so it is more recommended to turn on in debug only, for locating the errors more efficiently
max_errors
: Incollect_errors=True
mode, set a threshold for the number of errors. If the number of errors reaches this threshold, the collection will not continue, collected errors will be merged and thrown immediately.
from utype import Schema, Options, Field, exc
class LoginForm(Schema):
__options__ = Options(
addition=False,
collect_errors=True,
max_errors=2
)
username: str = Field(regex='[0-9a-zA-Z]{3,20}')
password: str = Field(min_length=6, max_length=20)
form = {
'username': '@attacker',
'password': '12345',
'token': 'XXX'
}
try:
LoginForm(**form)
except exc.CollectedParseError as e:
print(e)
"""
parse item: ['username'] failed: Constraint: <regex>: '[0-9a-zA-Z]{3,20}' violated;
parse item: ['password'] failed: Constraint: <min_length>: 6 violated;
"""
print(len(e.errors))
# > 2
Illegal data processing¶
Options also provides error handling for specific categories of elements.
invalid_items
: how to deal with illegal items in lists/sets/tuples.invalid_keys
: how to deal with illegal keys in dictionary/mapping.invalid_values
: how to deal with illegal values in dictionary/mapping.
These configurations all have the same options.
'throw'
: default value, throw error directly'exclude'
: strips illegal elements out of the data with a warning, but no error is thrown'preserve'
: left illegal elements in the data with a warning, but no error is thrown
Let’s look at an example in detail.
from utype import Schema, Options, exc
from typing import List, Dict, Tuple
class IndexSchema(Schema):
__options__ = Options(
invalid_items='exclude',
invalid_keys='preserve',
)
indexes: List[int]
info: Dict[Tuple[int, int], int]
data = {
'indexes': ['1', '-2', '*', 3],
'info': {
'2,3': 6,
'3,4': 12,
'a,b': '10'
}
}
index = IndexSchema(**data)
# UserWarning: parse item: [2] failed: could not convert string to float: '*'
# UserWarning: parse item: ['a,b<key>'] failed: could not convert string to float: 'a'
print(index)
# > IndexSchema(indexes=[1, -2, 3], info={(2, 3): 6, (3, 4): 12, 'a,b': 10})
We specified invalid_items='exclude'
in the Options of IndexSchema
, so illegal elements in the list elements will be eliminated, such as the input ['1', '-2', '*', 3]
being converted to [1, -2, 3]
We have also specified invalid_keys='preserve'
,indicating that the dictionary key cannot be converted will be retained, so in 'info'
field of the input, the key that can complete the conversion is converted, and the key that cannot complete the conversion is also retained
Warning
Unless you known what you are doing, do not use 'preserve'
in error configuration, for that will break the type-safe guarantee in the runtime
Field behavior options¶
Options provides options to configure the behavior of the fields, including
ignore_required
: ignore required parameters, which make all parameters optionalno_default
: ignore default values. unprovided parameters will not appear in the dataforce_default
: force to specified a default valuedefer_default
: forcibly defer the calculation of the default value, corresponding to thedefer_default
inField
paramsignore_constraints
: ignore constraints validation, with type conversion onlyimmutable
: make all attributes of the dataclass immutable, that is, they cannot be assigned or deleted.
Warning
no_default
, defer_default
and immutable
only applies to dataclasses, which cannot use in function Options
These options are not enabled by default. Enabling these options is equivalent to forcing the configuration value of the Field
, so you can refer to Field API References
Field alias options¶
Options also provides some options for controlling field names and aliases
case_insensitive
: whether to accept parameters in a case-insensitive manner. The default is False.alias_generator
specifies a function used to generate output aliases for fields that are not specifiedalias
alias_from_generator
: specifies a function used to generate input aliases for fields that are not specifiedalias_from
ignore_alias_conflicts
: whether to ignore alias conflicts in the input data. The default is False.
Case style transformation¶
Different programming languages or developers may have different naming styles, so the API functions you provide may need to be converted from different naming styles.
For example, in Python, you typically use lowercase and underscore to name fields, and if your client needs to receive camelCase data, you typically need to declare it this way.
from utype import Schema, Field
class ArticleSchema(Schema):
slug: str
liked_num: int = Field(alias='likedNum')
created_at: str = Field(alias='createdAt')
But because Options provides alias_generator
options, you can specify an output alias conversion function utype for the entire dataclass, such as
from utype import Schema
from utype.utils.style import AliasGenerator
from datetime import datetime
class ArticleSchema(Schema):
__options__ = Schema.Options(
alias_from_generator=[
AliasGenerator.kebab,
AliasGenerator.pascal,
],
alias_generator=AliasGenerator.camel
)
slug: str
liked_num: int
created_at: datetime
data = {
'Slug': 'my-article', # pascal case
'LikedNum': '3', # pascal case
'created-at': '2022-03-04 10:11:12' # kebab case
}
article = ArticleSchema(**data)
print(article)
print(dict(article))
# {
# 'slug': 'my-article',
# 'likedNum': 3,
# 'createdAt': datetime.datetime(2022, 3, 4, 10, 11, 12)
# }
In order to make the naming style conversion more convenient, some commonly used alias generation functions that can generate various naming style fields have been provided in utype.utils.style.AliasGenerator
.
camel
:camelCase
naming stylepascal
:PascalCase
naming stylesnake
:snake_case
naming style, the recommended variable naming style for languages such as Pythonkebab
:kebab-case
naming stylecap_snake
:CAP_SNAKE_CASE
naming style, often used for constant namingcap_kebab
:CAP-KEBAB-CASE
naming style
You only need to use these functions to specify alias_generator
or alias_from_generator
to get the corresponding naming style conversion capabilities, such as the parsing option in the example specifies alias_from_generator
[AliasGenerator.kebab, AliasGenerator.pascal]
, which means it can convert from the input data in the kebab-case naming style and PascalCase naming style, and which alias_generator=AliasGenerator.camel
means it will convert the output data to the camelCase style
So we can see that the naming style used for the input data in the example can be correctly recognized and accepted, the corresponding type conversion is completed and output to the target alias.