Gherkin: A retrospective
Recently the team I used to work in have had a big shake up.
As the project has matured another team in the territory we were building the product for has started to take on testing responsibilities which is great to see but it’s made our team have a closer look at the tools we were using and question if they were being used properly — especially the use of Gherkin feature files.
The main catalyst for this was one of the team members finding themselves rewriting more than half the feature files to be clearer for the other team to understand the intended behaviour of the system, something that is meant to be Gherkin’s strength.
With me being the kind of person who likes to introspect on things I hosted a session for the team to get a better understanding of what had lead to the situation we found ourselves in.
To do this I decided to draw an Ishikawa (also known as a Fishbone) diagram on the whiteboard and drew up the following headings:
- Materials — What were the raw materials our team was working with in order to carry out testing and the writing of the feature files?
- Measurements — How was the team measuring the quality of the product and reporting on this?
- Methods — What methods were we using in the team to write the feature files and write the code executed to test those features?
- Environment —Were there any environmental factors that impacted our ability to work with Gherkin? As a team of contractors, I decided to also include aspects of the company we’re working for’s culture and processes
- Personnel — What roles in the team may have had an impact in our ability to utilise Gherkin to its full potential?
- Machines — What tools that we used could have impacted our ability to work with Gherkin?
The raw materials the team worked with were:
- Feature files — Written in Gherkin and automated via CucumberJS (originally automated using WDIO and its Cucumber Plugin)
- Specifications — Written in Jira as test tickets which were backed by Zephyr for test execution
Looking at the use of the feature files one of the first things that was raised was that the new team was treating the feature files as test plans for executing manual testing.
The root cause of this seemed to be the management teams assumption that the feature files would be written in a way that allowed for manual testing to be carried out as a set of instructions. This shaped the Gherkin to be more of a description of ‘how’ the system worked instead of ‘what’ it did.
Additionally as the product we were building was re-using a lot of existing components from the business’ main product we started the project by cloning and adapting a lot of the existing Gherkin files used to describe that main product.
The team felt that this lead them down the wrong path from the start as the domain language used for the new market was clashing with the domain language of the existing market and it would have been better to build it from the ground up using the single domain language.
The root cause of the majority of the issues the team faced appeared to be poorly written specifications and a lack of inclusion from the team in the writing of the acceptance criteria of the specifications.
Due to the re-use of the existing feature files it was assumed that the test team could use those feature files to understand the requirements being built into the system and forego the three amigos process.
This lead to a couple of outcomes:
- The specifications had vague acceptance criteria
- The test team had a big backlog of feature files to automate from the get go which meant that development and test were not working in parallel
- The feature files have inconsistent language in them due to the different domain language used
- Due to the pressure to get feature files automated, the feature files were the only test artefacts being produced and thus the only resources available to hand over to the new team
When approaching the subject of measurements we started to realise that aside from points delivered against the backlog of tests to write we never really measured ourselves internally.
External to the team we were measured on the coverage of test suite and defects logged but on further inspection we felt these weren’t measured correctly.
The team had no clear stakeholder they were reporting to that would scrutinise the testing effort and ensure it was on-track, the most we saw was an email from the programme lead to the directors of the programme with a piechart showing 56 tests were green (this type of reporting drives me insane).
To the team it felt like the real measurement we were held against is whether or not the automation would work, a binary evaluation of the health of the product that ultimately meant the rest of the product team didn’t buy into the tests and started to ignore them.
There was one instance that the team felt they were working with clear insight into the progress being made, the coverage of the tests and the reporting being transparent enough to communicate any issues and this was when we had a set of regulatory requirements provided to us by the BAs.
This set of regulatory tests allowed the team to get feedback, ask questions about the expected and actual behaviours and raise defects with a clearer of idea of the severity of the issues found and during our session we realised this was because:
- The requirements were critical to the success of the product
- The BA team made sure they were available to discuss any concerns we had around the scenarios we had to automate
- We had regular reporting in place to show the level of coverage of the test suite but also those reports we going to the corporate stakeholders
- We were able to add regression tests to cover bugs and ensure these were caught as they were raised due to the developers also working in the same area of the product
Outside of the Feature file based automation, we worked on a set of a simple API level user journeys to prove out if the different environments were ready or not (kind of like a continual smoke test).
These tests worked really well to get the product team re-engaged with testing as they were reliable enough and were tied into the value that the product would deliver to the customer, something that’s easy to develop when you’ve got a clear goal and lightweight interface to deal with.
The matter of methods was probably the easiest for us to discuss as it was something that we’d covered in many a retrospective already as part of our continual improvement process.
We decided to inspect these aspects of the development of the code base and then look into the processes the entire team did that may have impacted our ability to work with Gherkin.
Points raised around the development process
In order to allow for re-use of our page object models we decided to create a separate library to store these, for me the main benefit was that it helped draw a clear line in the sand for the concerns of the POM and the step definition implementations.
It was raised that the need to release a new version of the library to get changes to the POM made available to the automation suite was seen as a bit of a time sink, especially if the changes made in the release were not correct.
This time sink I personally feel is justified by the benefits of the cleaner codebase (and there are means to set NPM to use a local package also) but it’s a valid point for those on the consuming side of the library.
Luckily for me the architectural decisions I helped put in place seemed to hold up to critique and the design was seen as a really helpful pattern that allowed tests to be written before the code was written (as the testers could write the functions in the step definitions before adding them to the library)
One of the points of contention within the team was the git flow process, the company has its own git flow process that they’ve built developer tooling around but unfortunately the lack of clear documentation meant most of the time we did things manually.
This led to a couple of instances of team members merging incorrectly or stepping on each other’s toes, things that could have been easily fixed by the git flow tooling being easier to install and run.
Points raised about the methods used with in the product team
There was only really one point that came out about the methods used within the product team and that was the lack of a proper three amigos process.
If we’ve been able to get a solid three amigos going we’d have been able to have iterated over the specifications a number of times, when we did have the three amigos the test team found there wasn’t enough time to think of and address all the different aspects of a user story in one meeting.
Looking into why the three amigos meetings didn’t work well there were two main reasons:
- The BAs spent a lot of time in another country that was 8 hours behind the UK as they were gathering requirements for the new market
- As testers there was this feeling that we were expected to just ‘get it’ and not ask questions as there was a need to churn the work out ready for launch
Environmental factors was probably the longest part of the session, there was a lot of things that prevented us from working as optimally as we’d have liked.
I’ve kept the list as short as I can for brevity’s sake.
The market we’re targeting has a completely different domain language to the UK; they use a different means of showing the odds for a bet, they have their own language for different betting types & restrictions and they present their events in a different way too.
As the team was made up of contractors, all this new domain language meant that the test team was on a constant quest to understand the requirements and test the system while also trying to understand UK betting domain language.
If you then take into consideration the fact we were also trying to port the existing UK based gherkin into our test framework you’ve got a recipe for some very confused testers.
The team felt if we’d started the gherkin from scratch this would have allowed us to keep to one domain language and would have prevented confusion.
As we’re building a product for a market that is on the other side of the world having a distributed team was going to happen but the impact of this was massive in regards to our ability to understand and verify requirements.
The company already had an office in the same country so a lot of the Subject Matter Experts (SME) came from that office, however due to competing priorities there wasn’t a lot of time to have access to the SMEs which when paired with the fact that office is 8 hours behind ours meant that if we weren’t working 12:00–20:00 then there was only one chance, a meeting at 16:00 to raise anything.
One of the biggest barriers to the automated testing of the product was the infrastructure, some of it was provided by the team in the other country and some of it was provided by a dedicated ‘cloud’ team within the company.
There was never a stable test environment with something in the stack always being unavailable, misconfigured or constantly having new badly tested releases pushed up.
There was no means to run the stack locally either which made writing the automated tests even more frustrating as you never knew when you’re development progress would be blocked for a couple of days due to the environment’s firewall rules being reset and blocking access.
Similar to the SMEs being stretched between projects there was a severe lack of system engineer support within the product team due to commitments elsewhere. I think we had 0.75 of an engineer most of the time and a lot of that was spinning up new environments.
At one point the team had to just give up on an environment as the team in the other country just stopped supporting it.
When looking at the roles people played in our ability to use Gherkin for testing we focused on the roles interaction with the team as well as how they seemed to perceive the team.
We felt that the BAs didn’t buy into Gherkin and instead saw it as the language the test framework was built in, there was no use of it to build ‘Living Documentation’ or shared understanding.
There was also a lot of instances where the requirements of a feature would change and the developers would be told by the BAs directly.
This meant our tests started to fail and we would then have to investigate what happened only to be told that our tests were wrong and needed updating, this would then contribute to the notion of the tests not being valuable.
The developers were a lot easier to get hold of than most of the other roles on this list as they were in the same office and were happy to explain what they were working on.
However when test would try and put quality checks in place such as CI or quality gates for releases there was a lot of push back from the development team claiming ‘I don’t want my ability to deliver blocked due to your failing tests’.
I feel that clearly shows the lack of buy-in of the automated tests and shows the divisions that the team had developed due to there not being a clear demonstration of value from the automated tests early on.
Having been that same dev team a few months beforehand it really disheartened me to be on the receiving end of that comment as I had tried to, as a developer to encourage a TDD and BDD culture.
The company has it’s own tribe of System Engineers (although it calls them Cloud Engineers) that we struggled to get as much resource as we needed from.
One of the things that would have allowed us to demonstrate the value of the automated tests would have been if we were able to set up a CD pipeline so the developers could see feedback on their releases as they were put into the test environment.
After raising this in a number of retrospectives I was shown that the deployment pipeline used for our test environment had the ability to do this but it hadn’t been configured by the System Engineer who set up our infrastructure.
That CD pipeline functionality still remains turned off as there was never any resource assigned to the project to set it up and testing in that way wasn’t a priority.
As we looked into our own role in the team we started to realise that while we were writing a lot of automated tests when it came down to actually ensuring the system was working we resorted to just carrying out manual testing, as this was quicker to do and more reliable.
This fallback to manual testing to actually ensure the product worked as intended meant that even the test team didn’t believe that the automated test framework was usable.
It also meant that the product team started to view the test team more as checkers than testers and it was harder to fight our corner which lead to a lot of actions raised in retrospectives being ignored.
During the development of the automated tests we had changed test framework due to a lack of integration with our IDE (Webstorm) so that was a good starting point.
This move was due to a need to have a more flexible test runner, WebDriver.io was using a cucumber plugin which in our IDE could only be configured to run certain tags where as CucumberJS would allow for individual scenarios to be run without the need to change the gherkin file.
Continuous Integration / Continuous Delivery
The lack of CI and CD was probably the main issue that lead to us failing to really use Gherkin.
At the start of the project there wasn’t any time to set up a CI as there was a backlog of existing feature files to get automated and the person who kicked off the project was a Scala test engineer who had no prior experience building an automation test framework.
By the time the team had scaled up and I had joined there was already a massive backlog of tests and while we had automated test runs this was a separate thing used only by the test team.
As mentioned before the development team viewed the automated tests as something that would get in the way not something that would add value and I believe this is due to the fact that they’d managed to get six months into development without a CI so adding it after that would cause them issues.
What we learned
Aside from making us all feel a little better having gotten a lot of issues off our chests we started to have a clearer picture of what processes and outcomes lead us to where we are now:
- By not having a proper ‘Sprint 0’ we were unable to prove the value of automated testing to the product team early on enough to get buy in
- As the Business Analysts didn’t see any value from writing Gherkin the feature files and it’s results were never used to measure the quality of the product
- By aiming to re-use existing feature files to increase coverage we spent a lot of time getting confused between two domain languages and ended up writing feature files that were more ‘how’ and less ‘what’
From a personal perspective I think this issue isn’t unique to just Gherkin but to automated test in general, if you can’t get the buy in early then you’ll struggle later.
Hopefully one day I’ll be able to work with Gherkin in an environment where that value has been demonstrated and I’ll get to see working as it should as ‘Living Documentation’ but for now I feel that spec based frameworks make BDD easier to implement as there’s less overhead.