Numpy arrays in Pydantic
Recently, I have been developing a FastAPI application which relies on the excellent Pydantic package for data validation and serialization.
I will give a very short intro, in case you are not familiar with Pydantic. If we develop our Python class objects as derived from Pydantic’s BaseModel class, then we can have very helpful things like type validation, type hinting, JSON data serialization and so on for our class. e.g.
from pydantic import BaseModel
from typing import List
class Book(BaseModel):
id: int
name: str
author: str
subject: List[str]
book_data = {'id': 12,
'name': "What is Life?",
'author': "Schrodinger, Erwin",
'subject': ["physics", "biology"]}
my_book = Book(**book_data)
print(my_book.id)
#> 12
print(my_book.model_dump())
"""
{
'id': 12,
'name': "What is Life?",
'author': "Schrodinger, Erwin",
'subject': ["physics", "biology"]
}
"""
And this plays nicely with ORM as well.
If you are dealing with scientific or numerical data in Python, naturally you will use Numpy arrays. But how to handle Numpy arrays within Pydantic BaseModel?
Numpy array as an ‘Annotated’ type
We can define a custom type for our Numpy arrays using the Annotated
type. This will wrap around Numpy’s original ndarray
class. But we need to provide two additional things:
- A function to convert a provided string input to Numpy array - a before validation method
- A function to serialize a provided Numpy array into List/string - a custom serialization method
Sample code for this custom datatype MyNumPyArray
creation is given below:
import numpy as np
from pydantic import BaseModel, Field, BeforeValidator, PlainSerializer
from typing import Annotated
import ast
def nd_array_before_validator(x):
# custom before validation logic
if isinstance(x, str):
x_list = ast.literal_eval(x)
x = np.array(x_list)
if isinstance(x, List):
x = np.array(x)
return x
def nd_array_serializer(x):
# custom serialization logic
return x.tolist()
# return np.array2string(x,separator=',', threshold=sys.maxsize)
MyNumPyArray = Annotated[ np.ndarray,
BeforeValidator(nd_array_before_validator),
PlainSerializer(nd_array_serializer, return_type=List),
]
# Remember to add 'model_config = ConfigDict(arbitrary_types_allowed=True)' to the model class when using MyNumPyArray
Now, you can include numpy
arrays in Pydantic classes as given below:
from pydantic import BaseModel, ConfigDict
class SomeClass(BaseModel):
"""
Sample class that has a Numpy array field
"""
name: str
data: MyNumPyArray
model_config = ConfigDict(arbitrary_types_allowed=True)
# Testing
sample_data = np.array([[1,2],[3,4]])
test_instance = SomeClass(name="Test", data=sample_data)
print(test_instance.model_dump())
#> {'name': 'Test', 'data': [[1,2], [3,4]]}
References and notes:
- https://github.com/pydantic/pydantic/issues/7017#issuecomment-1670142686
- You can use
pydantic.dataclasses
if you only require data validation from Pydantic for Numpy arrays. See, for example, https://stackoverflow.com/questions/70306311/pydantic-initialize-numpy-ndarray