An introduction to JSON Schema
Recently I’ve been building an API around the FRESH resume schema, a JSON schema to describe a person’s experiences that can be used to generate a resume.
Now I’ve built the API I’ve come to better understand the benefits a JSON schema offers and some of the tooling available that makes working with JSON schema so much easier.
Understanding JSON schema and it’s applications will not only be beneficial to developers but also to testers and even to less technical members of the team as it allows static testing of data the system uses and returns allowing for quick wins when understanding how the system functions.
Why JSON schema exists
If you were dealing with external data sources like APIs you’d mostly be communicating with them as part of the back-end code, most likely via XML (and if you were unfortunate, SOAP) and then using a XSLT to transform that XML into something to render to the user.
As the web evolved into the platform it is now, driven by the need for browsers to offer similar functionality to the apps people were starting to use there was a push towards asynchronous operations for updating the page.
At the same time as there was a push to make things more interactive there was another development which I feel is the main reason JSON is as prevalent as it is now — NodeJS.
JSON however isn’t perfect
XML’s Document Type (DTD) & Schema (XSD) DefinitionsXML documents can have a DTD (referenced at root level) or XSD (referenced at node level) included in them that defines the structure and types of data the document holds.
Using these definitions in the document allows for the validation rules to be available to the receiver of the document so they can validate the compliance to the format before reading it.
CommentsThis is something that lead to some heated debates if raised in a particularly opinionated group of people but regardless of your stance it’s still something that XML offers that JSON does not.
The decision to exclude comments from JSON is that it’s only for data serialisation and as such there’s no need for comments which I agree with, the issue comes about then people use JSON as a means of configuring tooling which benefits from being able to annotate certain values.
A NodeJS package — JSON5 adds comments and another bunch of functionality to make JSON more ‘human friendly’ but I would argue that YAML is a better case for meeting this need.
JSON Schema to the rescue!
In order to make it easier to understand the JSON being transmitted between different systems, a project called JSON Schema was created to create a standardised way of describing the data.
Similar to the way that XSD allows for the structure and data types to be defined JSON Schema adds this important information to JSON while also allowing for re-usable object definitions and schema extensions.
Unlike XSD though the schema lives outside of the JSON document and there’s no standardised way of adding meta data to the JSON, meaning that without prior knowledge the system receiving the JSON data cannot easily validate the data’s adherence to the schema.
Basics of JSON schema
I’ll preface this section by saying that JSON Schema is still very much in flux and this information is correct at the time of writing which would be against JSON Schema Draft 7.
JSON Schema supports the following data types:
string— For character values
number— For integer and decimal values
object— For associative arrays
array— For collections of the other data types
null— For uninitiated data
format property such as adding
format: email to a string to specify the string should adhere to the standardised email address format.
If we take this example object representing a dog:
We’re expecting the
name to be a string,
dob to be a date,
owners to be an array of objects which contains a
name (string) and
This can be represented in JSON Schema as:
More complex use cases
For the majority of cases you can get away with the basics but as the system scales up there will be cases where you might need to add flexibility for different data types (such as a
string), make re-usable object definitions or even extend an existing schema to add or change properties.
Handling different data types for the same property
Using our dog example, let’s say we want to make the name of the dog optional by allowing
null to be accepted as well as a
We can achieve this using the
oneOf keyword which allows an array of different values that are acceptable but only one value.
There are a number of keywords that perform similar functionality:
anyOf— Allows for any of the defined data types
oneOf— Allows only one of the defined data types
allOf— Allows only values that meet all defined data types
not— Allows only values that don’t meet the defined data types
More examples of the different keywords are available on the JSON Schema website but here’s an example of
oneOf used to allow both
null values for the dog object’s
not keywords are great for validating at a structural level but there may be times where you want to validate at a logical level, applying particular formatting rules dependent on the value of another property.
The JSON Schema website has a really good example of this using postal codes in different countries and the regular expressions used to verify the them.
Re-usable object definitions
As the data object grows it’s likely there will be instances of objects that are shared across multiple instances, for instance if we expanded the system the dog object is used in to have a customer list then we’d want to re-use the owner object.
In order to only have one definition of the owner object we’d create a
definitions object in the document root that holds the definitions, which can then be referenced in the objects later on using the
In this example, we extract the
owner object and use a reference in the
dog object and in the
Object definitions can also reference themselves, should you need to build schemas to handle tree like data structures.
You can also reference object definitions from external files, making it easier to re-use objects across a collection of JSON Schemas.
Extending existing Schema
In order to reduce the amount of duplication shared across multiple schema you may find the need to create a base object which can then be extended with the relevant properties in subsequent schemas.
This can be achieved using a
$ref to the base object and using the
allOf keyword to enforce adherence to all aspects of the existing schema and the new schema.
In the following example we create a base
animal object and then extend it to create
cat objects, each with their own specialised properties.
The JSON Schema website hosts a really extensive list of tools and frameworks that can be used to work with JSON Schemas but as I used NodeJS in the project I used JSON Schema in I’ll give a few examples of
ajv , a library and CLI tool that I found made working with JSON Schema a lot easier.
I also found https://jsonschema.net/home useful when moving my JSON Schema into a YAML format for use in a Swagger document.
https://www.liquid-technologies.com/online-json-schema-validator is also a great online schema validator.
npm install ajv If you plan on using the library in code, or
npm install ajv-cli if you want a CLI wrapper.
Validating a JSON document against the Schema with AJV
In order to validate a JSON document you’ll need to save the object your system produces or expects as input into a
.json file and you’ll need your JSON Schema saved into a
.json file also.
You can then run
ajv validate -s [PATH TO SCHEMA].json -d [PATH TO DOC].json which will produce a list of errors should the JSON document not conform to the Schema.
You can also run
ajv compile -s [PATH TO SCHEMA].json if you want to just validate the schema.
Migrating a Schema to a newer draft with AJV
As JSON Schema is constantly evolving it can be hard to keep on top of the different changes that need to be made to make it compliant with a later spec.
Fortunately AJV has a tool for doing just that, you can migrate your JSON Schema using
ajv migrate -s [PATH TO SCHEMA].json -o [PATH TO NEW SCHEMA DOC].json .
This script will first verify that the JSON Schema complies with the draft that it says it adheres to and then it will produce a new file of that Schema that’s compliant with latest draft.
I found this very useful when I started working with FRESH in a more serious manner as the Schema on Github wasn’t compliant with
draft-4 (which it claimed to be) and once I made a few changes to make it compliant I was able to migrate up to
Hopefully this post has served as an introduction to how JSON Schemas work and the benefits of using one.
When looking for more information on JSON Schema I’ve found the https://json-schem a.org/understanding-json-schema/index.html website really useful as it offers examples when explaining the different concepts.