codelike - How to Serialize SQLALchemy Objects to JSON in Pyramid

So you are working on the backend for a single-page application in Pyramid and need to serialize all kinds of objects to JSON? In this post we'll work our way up from the basic JSON-serialization built into Pyramid to a powerful approach for serializing SQLAlchemy objects.

Serializing Simple Data Structures

Pyramid comes with this great concept called renderers: You simply return data from your view function (other MVC frameworks call this part the controller) and the renderer is responsible for turning the data into reasonable output. The renderer might be a jinja-template that uses the data to render some HTML result. But in this post we are interested in Pyramid's built-in JSON renderer, which already knows how to deal with Python's basic types. The view below:

from pyramid.view import view_config

@view_config(route_name='user_basic', renderer='json')
def get_user_basic(request):
    return {
        "id": 1,
        "name": "Bruce Wayne",
        "super_hero": True,
        "friend_ids": [2, 3, 5, 8]
    }

will automatically be turned into the JSON structure that we would expect (formatted nicely for this post):

{
  "id": 1,
  "name": "Bruce Wayne",
  "super_hero": true,
  "friend_ids": [2, 3, 5, 8]
}

As you can see, the JSON-renderer already knows how to deal with dictionaries, strings, integers, ... . It uses Python's built-in json library, so it knows how to deal with Python's basic data types.

Serializing Custom Obects

Sooner or later, you will want to serialize an object where the JSON-renderer doesn't know what to do with it. When I start a new project, the very first case where this happens is usually Python's datetime instances. Because of the additional datetime-object, the following won't work... yet:

@view_config(route_name='user_custom', renderer='json')
def get_user_custom(request):
    return {
        "id": 1,
        "name": "Bruce Wayne",
        "super_hero": True,
        "friend_ids": [2, 3, 5, 8],
        "created_at": datetime.datetime(2015, 1, 23, 16, 2, 15)
    }

What we'll get so far is an error message:

TypeError: datetime.datetime(2015, 1, 23, 16, 2, 15) is not JSON serializable

One way to tell Pyramid how to serialize an object to JSON is to add a __json__-method to the relevant class. We'll look at that option later. In the Python world, it is generally frowned upon to monkey patch additional attributes to classes from the outside. datetime is an object from the standard library, so we definitely should not extend that with a magic __json__ method. That would be bad style.

For these cases (where we do not want to or cannot modify existing code), we can use Pyramid`s add_adapter functionality. Let's use that in a file called jsonexample/util/jsonhelpers:

import datetime
from pyramid.renderers import JSON

def custom_json_renderer():
    """
    Return a custom json renderer that can deal with some datetime objects.
    """
    def datetime_adapter(obj, request):
        return obj.isoformat()

    def time_adapter(obj, request):
        return str(obj)

    json_renderer = JSON()
    json_renderer.add_adapter(datetime.datetime, datetime_adapter)
    json_renderer.add_adapter(datetime.time, time_adapter)
    return json_renderer

That way we tell Pyramid to use the given functions datetime_adapter and time_adapter for turning objects of type datetime.datetime or datetime.time into JSON. The only thing missing is to make Pyramid actually use our custom renderer. That happens in our main __init__.py:

...
from .util.jsonhelpers import custom_json_renderer

def main(global_config, **settings):
    ...
    config = Configurator(settings=settings)
    config.add_renderer('json', custom_json_renderer())

    ...
    return config.make_wsgi_app()

And with that, our object is turned into the JSON we want:

{
  "id": 1,
  "name": "Bruce Wayne",
  "super_hero": true,
  "friend_ids": [2, 3, 5, 8],
  "created_at": "2015-01-23T16:02:15"
}

Serialization of SQLAlchemy Objects with json

If you're working with Pyramid, it is likely that you'll also be using http://www.sqlalchemy.org/. If yes, Pyramid will not know out of the box how to serialize SQLAlchemy-mapped objects to JSON. Let's assume we have this model:

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

class User(Base):
    __tablename__ = 'users'

    id = Column(Integer, primary_key=True)
    name = Column(Text)
    super_hero = Column(Boolean)
    created_at = Column(DateTime)

coupled with this view:

@view_config(route_name='sqlalchemy_simple', renderer='json')
def get_user_sqlalchemy_simple(request):
    user = DBSession.query(User).filter_by(name="Bruce Wayne").one()
    return user

Once again, we'll get a TypeError: <jsonexample.models.User object at 0x0464A4D0> is not JSON serializable when we call the new view. This time, we will use Pyramid's other option for JSON-serialization: We add a __json__-method to the relevant class that transforms our object into something usable. We can do that for all SQLAlchemy-models by extending the Base class that our models inherit from.

class Base(object):
    def __json__(self, request):
        json_exclude = getattr(self, '__json_exclude__', set())
        return {key: value for key, value in self.__dict__.items()
                # Do not serialize 'private' attributes
                # (SQLAlchemy-internal attributes are among those, too)
                if not key.startswith('_')
                and key not in json_exclude}

Base = declarative_base(cls=Base)

Our new Base class transforms our model-instances into a Python dict by iterating over the internal __dict__ of the model. Note that SQLAlchemy stores its internal data in an attribute called _sa_instance_state. We want to avoid serializing that and other private attributes starting with _, so we exclude those from the result.

Very often there are other specific attributes of your models that should not be serialized to JSON either. Imagine a password-hash field of your User-objects or other data that should not be public. For this case the above Base class allows you to exclude certain attribute by adding their names to __json_exclude__. Let's say that created_at is an internal attribute that shouldn't be serialized for our public API. Here's what that looks like:

class User(Base):
    __tablename__ = 'users'
    __json_exclude__ = set(["created_at"])

    id = Column(Integer, primary_key=True)
    name = Column(Text)
    super_hero = Column(Boolean)
    created_at = Column(DateTime)

Our Base class together with our __json_exclude__ gives us the following JSON-result:

{
  "id": 1,
  "name": "Bruce Wayne",
  "super_hero": true
}

Advanced Serialization with Marshmallow

The __json__-approach above is alright for basic cases, but falls short in more demanding situations. Imagine these:

Depending on the chosen route or view method, you want to serialize an object with or without its attached relationships.
Perhaps the client-side should be able to specify with a GET-parameter whether they want the attached relationships or only the central object.
Depending on the currently logged in user, you want to serialize the full SQLAlchemy-mapped object (for admin users) or a reduced set of attributes only (for normal users).

In all three cases, our approach with __json_exclude__ is not enough, because the excluded attributes are hard-coded per class. Cases two and three are even more challenging because the decision which attributes must be serialized happens at runtime. At this point, the marshmallow library comes in very handy. marshmallow allows you to specify a schema for serializing/deserializing objects to/from JSON. We are interested in the serialization part. For this part, you want the 2.0-version of marshmallow, which is still in beta but very much usable.

The marshmallow docs start with an example like this for defining a schema:

from marshmallow import Schema, fields

class ArtistSchema(Schema):
    name = fields.Str()

class AlbumSchema(Schema):
    title = fields.Str()
    release_date = fields.Date()
    artist = fields.Nested(ArtistSchema)

That looks alright, but the parts with fields.Str()/fields.Date()/... would mean we have to duplicate lots of information from our User-model above. We have already defined name as a string-column for SQLAlchemy, we don't to define it again as field fields.Str() for marshmallow. Fortunately, there's a shorter way to declare a schema with marshmallow. In our case, that would simply be:

from marshmallow import Schema

class UserSchema(Schema):

    class Meta:
        fields = ("id", "name", "super_hero", "created_at")

We can now explicitly include or exclude attributes from serialization, for example:

user = DBSession.query(User).get(1)
full_schema = UserSchema()
result, errors = full_schema.dump(user)
print(result)
# {'created_at': '2015-07-19T18:09:13.875568+00:00',
#  'name': 'Bruce Wanye', 'super_hero': True, 'id': 1}

# You can blacklist certain attributes
reduced_schema = UserSchema(exclude=("id", "created_at"))
result, errors = reduced_schema.dump(user)
print(result)
# {'name': 'Bruce Wanye', 'super_hero': True}

# Or whitelist attributes
other_schema = UserSchema(only=("name", "super_hero"))
result, errors = other_schema.dump(user)
print(result)
# {'name': 'Bruce Wanye', 'super_hero': True}

An initial approach might be to use such a schema directly in the view:

@view_config(route_name='sqlalchemy_marshmallow', renderer='json')
def get_user_sqlalchemy_marshmallow(request):
    user = DBSession.query(User).filter_by(name="Bruce Wayne").one()

    # Now we select the schema and which fields to be included/excluded
    # based on some runtime condition. Imagine a test if the currently
    # logged in user is admin or not.
    if random.randint(0, 1):
        user_schema = UserSchema()
    else:
        user_schema = UserSchema(exclude=("id", "created_at"))

    data, errors = user_schema.dump(user)
    return data

As you can see, we can decide during runtime which attributes we'd like to include or exclude, which is very nice. It's not part of our example, but marshmallow works nicely for nested attributes (e.g. SQLAlchemy-relationships), too. You should read the marshmallow-docs about Nesting Schemas for that.

There's still one problem: marshmallow has its own way of rendering all those fields, for example datetime-objects. All the JSON-adapters we defined earlier won't be used by marshmallow. That's bad because then we get different serializations of datetime-objects depending on whether we use marshmallow in a view or not. So how can we force marshmallow not to use its internal type-mappings? That's pretty easy. We override the type mappings and use that schema as the basis of our user-schema instead:

# in util/jsonhelpers.py
class RenderSchema(Schema):
    """
    Schema to prevent marshmallow from using its default type mappings.
    """
    TYPE_MAPPING = {}


# In models.py
class UserSchema(RenderSchema):
    ...

Integration of Marshmallow into JSON-Renderer

The marshmallow-approach above is nice and clean, but there are two things that I'm not completely happy about:

You have to do the part with data, errors = schema.dump(some_obj) in every view where you want to use marshmallow.
I really like that Pyramid lets you return domain-objects from your views and the renderer turns it into the final result, because that's very nice for unit-testing: You can directly inspect the returned objects in your unit tests. However, in the marshmallow approach above you don't return your actual SQLAlchemy-objects anymore. Instead you return the data dict created by marshmallow.

Wouldn't it be nice if you could return your SQLAlchemy-objects as usual and the renderer knew how to transform it into JSON based on a marshmallow schema? There is a way, though it's a bit tricky and relies on Pyramid internals. If you use this, be advised that it might break on Pyramid version updates and is not officially supported.

Still, I like the approach for using it in Pyramid views so much that I want to post this, too. First, we directly inherit from Pyramid's JSON-renderer as follows:

from pyramid.renderers import JSON

class SchemaJsonRenderer(JSON):
    """
    Extends Pyramid's JSON renderer with marshmallow-serialization.

    When a view-method defines a marshmallow Schema as request.render_schema,
    that schema will be used for serializing the return value.
    """

    def __call__(self, info):
        """
        If a schema is present, replace value with output from schema.dump(..).
        """
        original_render = super().__call__(info)

        def schema_render(value, system):
            request = system.get('request')
            if (request is not None and isinstance(getattr(request, 'render_schema', None), Schema)):
                try:
                    value, errors = request.render_schema.dump(value)
                except Exception:
                    errors = True

                if errors:
                    raise HTTPInternalServerError(body="Serialization failed.")

            return original_render(value, system)

        return schema_render

Second, if you want to use a certain schema for rendering your output, simply attach it to request as request.render_schema in your view:

@view_config(route_name='marshmallow_integrated', renderer='json2')
def get_user_marshmallow_integrated(request):
    user = DBSession.query(User).filter_by(name="Bruce Wayne").one()

    if random.randint(0, 1):
        request.render_schema = UserSchema()
    else:
        request.render_schema = UserSchema(exclude=("id", "created_at"))

    return user

Note that I've defined the SchemaJsonRenderer as renderer json2 in __init__.py, so the code for our example can use both renderers separately. And just like that (yes, I know, it took a while) you can render your SQLAlchemy-objects to JSON simply by attaching a schema as request.render_schema in your view.

By the way, you can find the complete source-code of this blog post at: https://github.com/martinstein/json-example

The commits follow the narrative of this post so you can follow step by step if you want. The most up to date commit contains all the approaches mentioned above. They are available at the routes "basic", "custom", "sqlalchemy_simple", "sqlalchemy_marshmallow" and "marshmallow_integrated". When you clone the repository, don't forget to run these commands once:

# instead of the next line, you can also use: pip install -e .
python setup.py develop
initialize_json-example_db development.ini

And then simply run your local server with

pserve development.ini

The source-code of the repository is not necessarily structured in a way that you would/should structure a larger project. For example, I wouldn't normally put all SQLAlchemy-related classes in a single models.py file. However, I've kept the code structure intentionally simple for this example to demonstrate the concepts.

Hope this post could help some of you. Feel free to contact me if you have further questions or suggestions.

How to Serialize SQLALchemy Objects to JSON in Pyramid

Serializing Simple Data Structures

Serializing Custom Obects

Serialization of SQLAlchemy Objects with __json__

Advanced Serialization with Marshmallow

Integration of Marshmallow into JSON-Renderer

Serialization of SQLAlchemy Objects with json