JSON Schema and Schema Validation in Clojure - codecentric AG Blog

:

You have probably heard of and may even have used XML Schema or Document Type Definitions to describe the structure of your XML files, to get autocompletion in your favorite IDE or to validate your HTML files (at least to some degree with HTML 5). While this helped us a lot the last years, many configuration files and REST-like APIs are nowadays expecting or returning JSON and, as it turns out, schemas are still helpful.

One possible use case for JSON schemas is the validation of user provided JSON. While developers in the statically typed language-world regularly use object mappers to map JSON data structures to classes and thus validate the structure of the provided data, developers in languages like JavaScript, Ruby and Clojure often use a much simpler approach. In such languages you commonly deserialize JSON to the languages’ equivalent data structures, i.e. most likely maps and lists, and continue to work with these data structures. Very simple applications will then go ahead and put the user provided data directly into some database. It should be obvious that doing so is a bad idea, but such things happen all too often (for instance GitHub’s mass assignment problem was actually quite similar).

Let us consider a very simple JSON Schema for an image. An image could be represented as a simple object with the four properties id, name, width and height. Additionally we want the name property to be required and the user should not be able to define additional properties. The following listing specifies our notion of an image.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "title": "image",
  "description": "Image representation",
  "properties": {
    "id": { "type": "string" },
    "name": { "type": "string" },
    "width": { "type": "integer" },
    "height": { "type": "integer" }
  },
  "required": ["name"],
  "additionalProperties": false
}

{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "title": "image", "description": "Image representation", "properties": { "id": { "type": "string" }, "name": { "type": "string" }, "width": { "type": "integer" }, "height": { "type": "integer" } }, "required": ["name"], "additionalProperties": false }

JSON schemas look remarkably simple, but let’s take it apart, starting from the top.

  • The $schema property defines which version of the JSON Schema specification this schema is supposed to comply with. Drafts are published as IETF working documents on json-schema.org and on the IETF website. For the purpose of this blog post the specifications for the JSON Schema core and JSON Schema validation are sufficient (don’t worry, you won’t need to read them).
  • Each object can be one of the seven defined primitive types. object corresponds to what you typically know as a hash or map. JSON Schema defines an integer type, which is quite interesting, as this type is not part of the JSON core specification.
  • title and description can be used to document the type and / or to provide additional information to the reader. The properties’ values are not of interest to a JSON validator.
  • properties is a special property for schemas with type object. It basically is a recursive data structure where each key resembles a valid property name and the value is a JSON Schema. In case of our example we have four very simple properties that only define the properties’ types. It doesn’t need to end here though. You can go crazy and define regular expression rules for strings, min and max values or numbers or even define custom types.
  • Through the required property we can define, well, required properties. Required means that an object must have at least the required keys to be considered as valid.
  • additionalProperties defines whether the user may define more properties than those defined in properties and patternProperties. We set this to false to enforce our object structure. By default the user can define more properties than those listed in our schema.

Another interesting feature is the ability to have references in schemas. You can use this feature to reference a part of your schema or even to reference other files. For instance, let us assume that our image schema resides in a file called image.json and that we want to define a collection of images in a file called collection.json. The following listing shows you how to do this.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "title": "collection",
  "description": "Detailed collection representation",
  "properties": {
    "name": { "type": "string" },
    "description": { "type": "string" },
    "images": {
      "type": "array",
      "items": {
        "$ref": "image.json"
      }
    }
  },
  "required": ["name"],
  "additionalProperties": false
}

{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "title": "collection", "description": "Detailed collection representation", "properties": { "name": { "type": "string" }, "description": { "type": "string" }, "images": { "type": "array", "items": { "$ref": "image.json" } } }, "required": ["name"], "additionalProperties": false }

The listing contains a new property type that you haven’t seen before: arrays. You can define acceptable item types for arrays. Again, this can be a JSON schema or, in this case, a reference to a JSON file which contains a JSON schema. JSON references are defined in a separate IETF working document.

While JSON Schema is very useful for the documentation of APIs and configuration files, I find the validation of user input to be especially valuable. Validators exist for a variety of languages. In this blog post I am using Clojure and the json-schema-validator library (which is just a plain Java library).

Let us start off simple and create a JsonSchemaFactory. This factory creates JsonSchema instances which are actually responsible for the document validation.

(def
  ^{:private true
    :doc "An immutable and therefore thread-safe JSON schema factory.
         You can call (.getJsonSchema json-schema-factory )
         to retrieve a JsonSchema instance which can validate JSON."}
  json-schema-factory
  (let [transformer (.. (URITransformer/newBuilder)
                        (setNamespace "resource:/schema/")
                        freeze)
        loading-config (.. (LoadingConfiguration/newBuilder)
                           (setURITransformer transformer)
                           freeze)
        factory (.. (JsonSchemaFactory/newBuilder)
                    (setLoadingConfiguration loading-config)
                    freeze)]
    factory))

(def ^{:private true :doc "An immutable and therefore thread-safe JSON schema factory. You can call (.getJsonSchema json-schema-factory ) to retrieve a JsonSchema instance which can validate JSON."} json-schema-factory (let [transformer (.. (URITransformer/newBuilder) (setNamespace "resource:/schema/") freeze) loading-config (.. (LoadingConfiguration/newBuilder) (setURITransformer transformer) freeze) factory (.. (JsonSchemaFactory/newBuilder) (setLoadingConfiguration loading-config) freeze)] factory))

As you can see we have to configure the factory in a special way so that it can resolve referenced schema files. You can do so through a URITransformer (JSON references are plain URIs). This transformer will only be consulted for referenced schema files as you will see later on.

Next up are some utility functions that we use to load the schema file from the classpath and to convert it to JsonNode instances.

(def
  ^{:private true
    :doc "Initialize the object mapper first and keep it private as not all
         of its methods are thread-safe. Optionally configure it here.
         Reader instances are cheap to create."}
  get-object-reader
  (let [object-mapper (ObjectMapper.)]
    (fn [] (.reader object-mapper))))
 
(defn- parse-to-node
  "Parse the given String as JSON. Returns a Jackson JsonNode."
  [data] (.readTree (get-object-reader) data))
 
(defn- get-schema
  "Get the schema file's contents in form of a string. Function only expects
  the schema name, i.e. 'collection' or 'image'."
  [schema-name]
  (slurp (io/resource (str "schema/" schema-name ".json"))))

(def ^{:private true :doc "Initialize the object mapper first and keep it private as not all of its methods are thread-safe. Optionally configure it here. Reader instances are cheap to create."} get-object-reader (let [object-mapper (ObjectMapper.)] (fn [] (.reader object-mapper))))(defn- parse-to-node "Parse the given String as JSON. Returns a Jackson JsonNode." [data] (.readTree (get-object-reader) data))(defn- get-schema "Get the schema file's contents in form of a string. Function only expects the schema name, i.e. 'collection' or 'image'." [schema-name] (slurp (io/resource (str "schema/" schema-name ".json"))))

All three functions are pretty standard. We have a utility function get-object-reader to create a Jackson ObjectReader instance. We need this and the following function parse-to-node as the JsonSchemaFactory‘s getJsonSchema method expects a parsed JSON schema. At last, we have a function get-schema to load a schema file’s contents from the classpath.

(defn validate
  "Validates the given 'data' against the JSON schema. Returns an object
  with a :success property that equals true when the schema could
  be validated successfully. It additionally contains a :message property
  with a human readable error description."
  [schema-name data]
  (let [parsed-schema (parse-to-node (get-schema schema-name))
        schema (.getJsonSchema json-schema-factory parsed-schema)
        parsed-data (parse-to-node data)
        report (.validate schema parsed-data)]
    {:success (.isSuccess report)
     :message (str report)}))

(defn validate "Validates the given 'data' against the JSON schema. Returns an object with a :success property that equals true when the schema could be validated successfully. It additionally contains a :message property with a human readable error description." [schema-name data] (let [parsed-schema (parse-to-node (get-schema schema-name)) schema (.getJsonSchema json-schema-factory parsed-schema) parsed-data (parse-to-node data) report (.validate schema parsed-data)] {:success (.isSuccess report) :message (str report)}))

The real core of our validation logic is the validate function. We use the previously defined functions to retrieve and parse the schema, turn this schema into a JsonSchema instance, parse the user provided data and generate a validation report.

If you are interested in the full source code, you can find this blog post’s example source code on GitHub.

JSON Schema can be useful for structural validation of user provided JSON. While pretty expressive, JSON Schema can’t be used to express a full range of semantical validations. For such validation rules, you will still need to fall back to your preferred validation mechanism. In addition to validation, you can use JSON Schemas to express your APIs or configuration files’ structures. The former could be used with tools like Swagger or RAML to document a REST-like API.