An introduction to JSON Schema
— JavaScript, Documentation, Software Development — 7 min read
Recently I’ve been building an API around the FRESH resume schema, a JSON schema to describe a person’s experiences that can be used to generate a resume.
When I initially started using the schema I wasn’t really aware of what a JSON schema was, viewing it as just another way of bringing types into JavaScript (I’m not a big fan of statically typed languages so I disregarded it as another fad, similar to TypeScript).
Now I’ve built the API I’ve come to better understand the benefits a JSON schema offers and some of the tooling available that makes working with JSON schema so much easier.
Understanding JSON schema and it’s applications will not only be beneficial to developers but also to testers and even to less technical members of the team as it allows static testing of data the system uses and returns allowing for quick wins when understanding how the system functions.
Why JSON schema exists
When I first started out as a web developer in 2010, JavaScript use was still relatively small, used mostly for adding interactivity to elements or building more complex interfaces.
If you were dealing with external data sources like APIs you’d mostly be communicating with them as part of the back-end code, most likely via XML (and if you were unfortunate, SOAP) and then using a XSLT to transform that XML into something to render to the user.
As the web evolved into the platform it is now, driven by the need for browsers to offer similar functionality to the apps people were starting to use there was a push towards asynchronous operations for updating the page.
At the same time as there was a push to make things more interactive there was another development which I feel is the main reason JSON is as prevalent as it is now — NodeJS.
NodeJS allowed developers to write both their front-end and back-end code in one language — JavaScript and JavaScript has a means of representing it’s objects as a string — JavaScript Object Notation (JSON).
Being able to request data from a server in JavaScript on the front-end, access a record and serialise that record as JSON on the back-end and then parse that data into an object ready to work with in the front-end is a big win for a developer and the fact that JSON tends to be smaller in file size compared to XML meant it was adopted rapidly.
JSON however isn’t perfect
While JSON’s easier to read, easier to work with when using JavaScript and smaller to send, XML has some benefits that JSON is still waiting to be standardised.
XML’s Document Type (DTD) & Schema (XSD) DefinitionsXML documents can have a DTD (referenced at root level) or XSD (referenced at node level) included in them that defines the structure and types of data the document holds.
Using these definitions in the document allows for the validation rules to be available to the receiver of the document so they can validate the compliance to the format before reading it.
CommentsThis is something that lead to some heated debates if raised in a particularly opinionated group of people but regardless of your stance it’s still something that XML offers that JSON does not.
The decision to exclude comments from JSON is that it’s only for data serialisation and as such there’s no need for comments which I agree with, the issue comes about then people use JSON as a means of configuring tooling which benefits from being able to annotate certain values.
A NodeJS package — JSON5 adds comments and another bunch of functionality to make JSON more ‘human friendly’ but I would argue that YAML is a better case for meeting this need.
JSON Schema to the rescue!
In order to make it easier to understand the JSON being transmitted between different systems, a project called JSON Schema was created to create a standardised way of describing the data.
Similar to the way that XSD allows for the structure and data types to be defined JSON Schema adds this important information to JSON while also allowing for re-usable object definitions and schema extensions.
Unlike XSD though the schema lives outside of the JSON document and there’s no standardised way of adding meta data to the JSON, meaning that without prior knowledge the system receiving the JSON data cannot easily validate the data’s adherence to the schema.
Basics of JSON schema
I’ll preface this section by saying that JSON Schema is still very much in flux and this information is correct at the time of writing which would be against JSON Schema Draft 7.
JSON Schema supports the following data types:
string
— For character valuesnumber
— For integer and decimal valuesobject
— For associative arraysarray
— For collections of the other data typesboolean
— Fortrue
orfalse
(boolean) valuesnull
— For uninitiated data
These are the same types that you’ll find in JavaScript but these types can then be specialised by using the format
property such as adding format: email
to a string to specify the string should adhere to the standardised email address format.
If we take this example object representing a dog:
We’re expecting the name
to be a string, dob
to be a date, owners
to be an array of objects which contains a name
(string) and email
(string but it’ll be an email address).
This can be represented in JSON Schema as:
More complex use cases
For the majority of cases you can get away with the basics but as the system scales up there will be cases where you might need to add flexibility for different data types (such as a number
or string
), make re-usable object definitions or even extend an existing schema to add or change properties.
Handling different data types for the same property
Using our dog example, let’s say we want to make the name of the dog optional by allowing null
to be accepted as well as a sting
.
We can achieve this using the oneOf
keyword which allows an array of different values that are acceptable but only one value.
There are a number of keywords that perform similar functionality:
anyOf
— Allows for any of the defined data typesoneOf
— Allows only one of the defined data typesallOf
— Allows only values that meet all defined data typesnot
— Allows only values that don’t meet the defined data types
More examples of the different keywords are available on the JSON Schema website but here’s an example of oneOf
used to allow both string
and null
values for the dog object’s name
property.
Conditional propertiesThe anyOf
, oneOf
, allOf
and not
keywords are great for validating at a structural level but there may be times where you want to validate at a logical level, applying particular formatting rules dependent on the value of another property.
The JSON Schema website has a really good example of this using postal codes in different countries and the regular expressions used to verify the them.
Re-usable object definitions
As the data object grows it’s likely there will be instances of objects that are shared across multiple instances, for instance if we expanded the system the dog object is used in to have a customer list then we’d want to re-use the owner object.
In order to only have one definition of the owner object we’d create a definitions
object in the document root that holds the definitions, which can then be referenced in the objects later on using the $ref
keyword.
In this example, we extract the owner
object and use a reference in the dog
object and in the customers
object.
Object definitions can also reference themselves, should you need to build schemas to handle tree like data structures.
You can also reference object definitions from external files, making it easier to re-use objects across a collection of JSON Schemas.
Extending existing Schema
In order to reduce the amount of duplication shared across multiple schema you may find the need to create a base object which can then be extended with the relevant properties in subsequent schemas.
This can be achieved using a $ref
to the base object and using the allOf
keyword to enforce adherence to all aspects of the existing schema and the new schema.
In the following example we create a base animal
object and then extend it to create dog
and cat
objects, each with their own specialised properties.
Tools
The JSON Schema website hosts a really extensive list of tools and frameworks that can be used to work with JSON Schemas but as I used NodeJS in the project I used JSON Schema in I’ll give a few examples of ajv
, a library and CLI tool that I found made working with JSON Schema a lot easier.
I also found https://jsonschema.net/home useful when moving my JSON Schema into a YAML format for use in a Swagger document.
https://www.liquid-technologies.com/online-json-schema-validator is also a great online schema validator.
Installation
npm install ajv
If you plan on using the library in code, or npm install ajv-cli
if you want a CLI wrapper.
Validating a JSON document against the Schema with AJV
In order to validate a JSON document you’ll need to save the object your system produces or expects as input into a .json
file and you’ll need your JSON Schema saved into a .json
file also.
You can then run ajv validate -s [PATH TO SCHEMA].json -d [PATH TO DOC].json
which will produce a list of errors should the JSON document not conform to the Schema.
You can also run ajv compile -s [PATH TO SCHEMA].json
if you want to just validate the schema.
Migrating a Schema to a newer draft with AJV
As JSON Schema is constantly evolving it can be hard to keep on top of the different changes that need to be made to make it compliant with a later spec.
Fortunately AJV has a tool for doing just that, you can migrate your JSON Schema using ajv migrate -s [PATH TO SCHEMA].json -o [PATH TO NEW SCHEMA DOC].json
.
This script will first verify that the JSON Schema complies with the draft that it says it adheres to and then it will produce a new file of that Schema that’s compliant with latest draft.
I found this very useful when I started working with FRESH in a more serious manner as the Schema on Github wasn’t compliant with draft-4
(which it claimed to be) and once I made a few changes to make it compliant I was able to migrate up to draft-7
.
More info
Hopefully this post has served as an introduction to how JSON Schemas work and the benefits of using one.
When looking for more information on JSON Schema I’ve found the https://json-schem a.org/understanding-json-schema/index.html website really useful as it offers examples when explaining the different concepts.