Skip to content
Colin Wren
Twitter

An introduction to JSON Schema

JavaScript, Documentation, Software Development7 min read

making plans
Photo by Daniel McCullough on Unsplash

Recently I’ve been building an API around the FRESH resume schema, a JSON schema to describe a person’s experiences that can be used to generate a resume.

When I initially started using the schema I wasn’t really aware of what a JSON schema was, viewing it as just another way of bringing types into JavaScript (I’m not a big fan of statically typed languages so I disregarded it as another fad, similar to TypeScript).

Now I’ve built the API I’ve come to better understand the benefits a JSON schema offers and some of the tooling available that makes working with JSON schema so much easier.

Understanding JSON schema and it’s applications will not only be beneficial to developers but also to testers and even to less technical members of the team as it allows static testing of data the system uses and returns allowing for quick wins when understanding how the system functions.

Why JSON schema exists

When I first started out as a web developer in 2010, JavaScript use was still relatively small, used mostly for adding interactivity to elements or building more complex interfaces.

If you were dealing with external data sources like APIs you’d mostly be communicating with them as part of the back-end code, most likely via XML (and if you were unfortunate, SOAP) and then using a XSLT to transform that XML into something to render to the user.

As the web evolved into the platform it is now, driven by the need for browsers to offer similar functionality to the apps people were starting to use there was a push towards asynchronous operations for updating the page.

At the same time as there was a push to make things more interactive there was another development which I feel is the main reason JSON is as prevalent as it is now — NodeJS.

NodeJS allowed developers to write both their front-end and back-end code in one language — JavaScript and JavaScript has a means of representing it’s objects as a string — JavaScript Object Notation (JSON).

Being able to request data from a server in JavaScript on the front-end, access a record and serialise that record as JSON on the back-end and then parse that data into an object ready to work with in the front-end is a big win for a developer and the fact that JSON tends to be smaller in file size compared to XML meant it was adopted rapidly.

JSON however isn’t perfect

While JSON’s easier to read, easier to work with when using JavaScript and smaller to send, XML has some benefits that JSON is still waiting to be standardised.

XML’s Document Type (DTD) & Schema (XSD) DefinitionsXML documents can have a DTD (referenced at root level) or XSD (referenced at node level) included in them that defines the structure and types of data the document holds.

Using these definitions in the document allows for the validation rules to be available to the receiver of the document so they can validate the compliance to the format before reading it.

CommentsThis is something that lead to some heated debates if raised in a particularly opinionated group of people but regardless of your stance it’s still something that XML offers that JSON does not.

The decision to exclude comments from JSON is that it’s only for data serialisation and as such there’s no need for comments which I agree with, the issue comes about then people use JSON as a means of configuring tooling which benefits from being able to annotate certain values.

A NodeJS package — JSON5 adds comments and another bunch of functionality to make JSON more ‘human friendly’ but I would argue that YAML is a better case for meeting this need.

JSON Schema to the rescue!

In order to make it easier to understand the JSON being transmitted between different systems, a project called JSON Schema was created to create a standardised way of describing the data.

Similar to the way that XSD allows for the structure and data types to be defined JSON Schema adds this important information to JSON while also allowing for re-usable object definitions and schema extensions.

Unlike XSD though the schema lives outside of the JSON document and there’s no standardised way of adding meta data to the JSON, meaning that without prior knowledge the system receiving the JSON data cannot easily validate the data’s adherence to the schema.

Basics of JSON schema

I’ll preface this section by saying that JSON Schema is still very much in flux and this information is correct at the time of writing which would be against JSON Schema Draft 7.

JSON Schema supports the following data types:

  • string — For character values
  • number — For integer and decimal values
  • object — For associative arrays
  • array — For collections of the other data types
  • boolean — For true or false (boolean) values
  • null — For uninitiated data

These are the same types that you’ll find in JavaScript but these types can then be specialised by using the format property such as adding format: email to a string to specify the string should adhere to the standardised email address format.

If we take this example object representing a dog:

1{
2 "name": "Spot",
3 "dob": "1999-01-01",
4 "owners": [
5 {
6 "name": "Colin",
7 "email": "doggos4lyfe@gmail.com"
8 }
9 ]
10}
A dog object we use in our system

We’re expecting the name to be a string, dob to be a date, owners to be an array of objects which contains a name (string) and email (string but it’ll be an email address).

This can be represented in JSON Schema as:

1{
2 "$schema": "http://json-schema.org/draft-07/schema#",
3 "$id": "http://colinwren.is/doggo.json",
4 "type": "object",
5 "title": "Dog",
6 "description": "Representation of the dog object used by the system",
7 "additionalProperties": true,
8 "properties": {
9 "name": {
10 "$id": "#/properties/name",
11 "type": "string",
12 "title": "Dog's name",
13 "description": "What do you call your dog?"
14 },
15 "dob": {
16 "$id": "#/properties/dob",
17 "type": "string",
18 "title": "Dog's date of birth",
19 "description": "When was the dog born?",
20 "format": "date"
21 },
22 "owners": {
23 "$id": "#/properties/owners",
24 "type": "array",
25 "title": "Dog's Owners",
26 "description": "Who owns the dog?",
27 "additionalItems": true,
28 "properties": {
29 "name": {
30 "$id": "#/properties/owners/properties/name",
31 "type": "string",
32 "title": "Dog Owner's name",
33 "description": "What do we call the owner?"
34 },
35 "email": {
36 "$id": "#/properties/owners/properties/email",
37 "type": "string",
38 "title": "Dog Owner's email",
39 "description": "What email can we get hold of the owner via?",
40 "format": "email"
41 }
42 }
43 }
44 }
45}
The schema is pretty straightforward and adds context to the properties and object structure

More complex use cases

For the majority of cases you can get away with the basics but as the system scales up there will be cases where you might need to add flexibility for different data types (such as a number or string), make re-usable object definitions or even extend an existing schema to add or change properties.

Handling different data types for the same property

Using our dog example, let’s say we want to make the name of the dog optional by allowing null to be accepted as well as a sting .

We can achieve this using the oneOf keyword which allows an array of different values that are acceptable but only one value.

There are a number of keywords that perform similar functionality:

  • anyOf — Allows for any of the defined data types
  • oneOf — Allows only one of the defined data types
  • allOf — Allows only values that meet all defined data types
  • not — Allows only values that don’t meet the defined data types

More examples of the different keywords are available on the JSON Schema website but here’s an example of oneOf used to allow both string and null values for the dog object’s name property.

1{
2 "$schema": "http://json-schema.org/draft-07/schema#",
3 "$id": "http://colinwren.is/doggo.json",
4 "type": "object",
5 "title": "Dog",
6 "description": "Representation of the dog object used by the system",
7 "additionalProperties": true,
8 "properties": {
9 "name": {
10 "$id": "#/properties/name",
11 "oneOf": [
12 {
13 "type": "string"
14 },
15 {
16 "type": "null"
17 }
18 ],
19 "title": "Dog's name",
20 "description": "What do you call your dog?"
21 },
22 "dob": {
23 "$id": "#/properties/dob",
24 "type": "string",
25 "title": "Dog's date of birth",
26 "description": "When was the dog born?",
27 "format": "date"
28 },
29 "owners": {
30 "$id": "#/properties/owners",
31 "type": "array",
32 "title": "Dog's Owners",
33 "description": "Who owns the dog?",
34 "additionalItems": true,
35 "properties": {
36 "name": {
37 "$id": "#/properties/owners/properties/name",
38 "type": "string",
39 "title": "Dog Owner's name",
40 "description": "What do we call the owner?"
41 },
42 "email": {
43 "$id": "#/properties/owners/properties/email",
44 "type": "string",
45 "title": "Dog Owner's email",
46 "description": "What email can we get hold of the owner via?",
47 "format": "email"
48 }
49 }
50 }
51 }
52}
OneOf allows us to accept null or a string as the dogs name

Conditional propertiesThe anyOf , oneOf , allOf and not keywords are great for validating at a structural level but there may be times where you want to validate at a logical level, applying particular formatting rules dependent on the value of another property.

The JSON Schema website has a really good example of this using postal codes in different countries and the regular expressions used to verify the them.

Re-usable object definitions

As the data object grows it’s likely there will be instances of objects that are shared across multiple instances, for instance if we expanded the system the dog object is used in to have a customer list then we’d want to re-use the owner object.

In order to only have one definition of the owner object we’d create a definitions object in the document root that holds the definitions, which can then be referenced in the objects later on using the $ref keyword.

In this example, we extract the owner object and use a reference in the dog object and in the customers object.

1{
2 "$schema": "http://json-schema.org/draft-07/schema#",
3 "$id": "http://colinwren.is/doggo.json",
4 "type": "object",
5 "title": "Dog",
6 "description": "Representation of the dog object used by the system",
7 "additionalProperties": true,
8 "definitions": {
9 "dog": {
10 "name": {
11 "$id": "#/properties/name",
12 "oneOf": [{
13 "type": "string"
14 },
15 {
16 "type": "null"
17 }
18 ],
19 "title": "Dog's name",
20 "description": "What do you call your dog?"
21 },
22 "dob": {
23 "$id": "#/properties/dob",
24 "type": "string",
25 "title": "Dog's date of birth",
26 "description": "When was the dog born?",
27 "format": "date"
28 },
29 "owners": {
30 "$id": "#/properties/owners",
31 "type": "array",
32 "title": "Dog's Owners",
33 "description": "Who owns the dog?",
34 "additionalItems": true,
35 "items": [{
36 "$ref": "#/definitions/owner"
37 }]
38 }
39 },
40 "owner": {
41 "type": "object",
42 "properties": {
43 "name": {
44 "$id": "#/properties/owners/name",
45 "type": "string",
46 "title": "Dog Owner's name",
47 "description": "What do we call the owner?"
48 },
49 "email": {
50 "$id": "#/properties/owners/email",
51 "type": "string",
52 "title": "Dog Owner's email",
53 "description": "What email can we get hold of the owner via?",
54 "format": "email"
55 }
56 }
57 }
58 },
59 "properties": {
60 "dogs": {
61 "type": "array",
62 "items": [{
63 "$ref": "#/definitions/dog"
64 }]
65 },
66 "customers": {
67 "type": "array",
68 "items": [{
69 "$ref": "#/definitions/owner"
70 }]
71 }
72 }
73}
By moving the owner object out we can re-use it in later blocks

Object definitions can also reference themselves, should you need to build schemas to handle tree like data structures.

You can also reference object definitions from external files, making it easier to re-use objects across a collection of JSON Schemas.

Extending existing Schema

In order to reduce the amount of duplication shared across multiple schema you may find the need to create a base object which can then be extended with the relevant properties in subsequent schemas.

This can be achieved using a $ref to the base object and using the allOf keyword to enforce adherence to all aspects of the existing schema and the new schema.

In the following example we create a base animal object and then extend it to create dog and cat objects, each with their own specialised properties.

1{
2 "$schema": "http://json-schema.org/draft-07/schema#",
3 "$id": "http://colinwren.is/doggo.json",
4 "type": "object",
5 "title": "Dog",
6 "description": "Representation of the dog object used by the system",
7 "additionalProperties": true,
8 "definitions": {
9 "animal": {
10 "name": {
11 "$id": "#/properties/name",
12 "oneOf": [{
13 "type": "string"
14 },
15 {
16 "type": "null"
17 }
18 ],
19 "title": "Dog's name",
20 "description": "What do you call your dog?"
21 },
22 "dob": {
23 "$id": "#/properties/dob",
24 "type": "string",
25 "title": "Dog's date of birth",
26 "description": "When was the dog born?",
27 "format": "date"
28 },
29 "owners": {
30 "$id": "#/properties/owners",
31 "type": "array",
32 "title": "Dog's Owners",
33 "description": "Who owns the dog?",
34 "additionalItems": true,
35 "items": [{
36 "$ref": "#/definitions/owner"
37 }]
38 }
39 },
40 "dog": {
41 "type": "object",
42 "required": ["tailWaggability"],
43 "allOf": [{
44 "$ref": "#/definitions/animal"
45 },
46 {
47 "type": "object",
48 "properties": {
49 "tailWaggability": {
50 "type": "integer",
51 "minimum": 1,
52 "maximum": 10
53 }
54 }
55 }
56 ]
57 },
58 "cat": {
59 "type": "object",
60 "required": ["toeBeansFactor"],
61 "allOf": [{
62 "$ref": "#/definitions/animal"
63 },
64 {
65 "type": "object",
66 "properties": {
67 "toeBeansFactor": {
68 "type": "integer",
69 "minimum": 1,
70 "maximum": 10
71 }
72 }
73 }
74 ]
75 },
76 "owner": {
77 "type": "object",
78 "properties": {
79 "name": {
80 "$id": "#/properties/owners/name",
81 "type": "string",
82 "title": "Dog Owner's name",
83 "description": "What do we call the owner?"
84 },
85 "email": {
86 "$id": "#/properties/owners/email",
87 "type": "string",
88 "title": "Dog Owner's email",
89 "description": "What email can we get hold of the owner via?",
90 "format": "email"
91 }
92 }
93 }
94 },
95 "properties": {
96 "dogs": {
97 "type": "array",
98 "items": [{
99 "$ref": "#/definitions/dog"
100 }]
101 },
102 "cats": {
103 "type": "array",
104 "items": [{
105 "$ref": "#/definitions/cat"
106 }]
107 },
108 "customers": {
109 "type": "array",
110 "items": [{
111 "$ref": "#/definitions/owner"
112 }]
113 }
114 }
115}
By extending the animal definition we can then re-use it’s properties and add more specialised ones

Tools

The JSON Schema website hosts a really extensive list of tools and frameworks that can be used to work with JSON Schemas but as I used NodeJS in the project I used JSON Schema in I’ll give a few examples of ajv , a library and CLI tool that I found made working with JSON Schema a lot easier.

I also found https://jsonschema.net/home useful when moving my JSON Schema into a YAML format for use in a Swagger document.

https://www.liquid-technologies.com/online-json-schema-validator is also a great online schema validator.

Installation

npm install ajv If you plan on using the library in code, or npm install ajv-cli if you want a CLI wrapper.

Validating a JSON document against the Schema with AJV

In order to validate a JSON document you’ll need to save the object your system produces or expects as input into a .json file and you’ll need your JSON Schema saved into a .json file also.

You can then run ajv validate -s [PATH TO SCHEMA].json -d [PATH TO DOC].json which will produce a list of errors should the JSON document not conform to the Schema.

You can also run ajv compile -s [PATH TO SCHEMA].json if you want to just validate the schema.

Migrating a Schema to a newer draft with AJV

As JSON Schema is constantly evolving it can be hard to keep on top of the different changes that need to be made to make it compliant with a later spec.

Fortunately AJV has a tool for doing just that, you can migrate your JSON Schema using ajv migrate -s [PATH TO SCHEMA].json -o [PATH TO NEW SCHEMA DOC].json .

This script will first verify that the JSON Schema complies with the draft that it says it adheres to and then it will produce a new file of that Schema that’s compliant with latest draft.

I found this very useful when I started working with FRESH in a more serious manner as the Schema on Github wasn’t compliant with draft-4 (which it claimed to be) and once I made a few changes to make it compliant I was able to migrate up to draft-7 .

More info

Hopefully this post has served as an introduction to how JSON Schemas work and the benefits of using one.

When looking for more information on JSON Schema I’ve found the https://json-schem a.org/understanding-json-schema/index.html website really useful as it offers examples when explaining the different concepts.