How can we test that our AI is ethical & can assure quality?


The topic of bias within machine learning and AI isn’t a new one. Many high-profile organisations have run afoul of bias creeping into its test data or as it grows. Media have been quick to highlight the growing concern around a system that helps support business decision making but doesn’t show its workings. However, businesses that correctly and stringently test their algorithms for bias have the potential to drive greater efficiency and streamline decision-making, if done correctly. Testing an AI follows the same process as general testing – we know what the system should do and we need to make sure it does that correctly, in a way that is fair and ethical. As a guiding principle, a lot of ethical AI testing is going to be black box – especially as the use of pre-trained or vendor supplied AI systems increases.

However, its fairness still needs to be proven by those who seek to use it as part of their day-to-day operation. Decisions should be challenged and the evidence to show that they were made impartially should be collected as part of any testing efforts. But how do Quality Assurance (QA) testers best go about assessing for bias in an algorithm?

Understanding the role of the QA tester

 The typical role of a QA tester is to put a product or software through its paces and check to see if it can withstand tough interrogation before being built at scale. But when testing an AI, the motivation is slightly different. No AI or algorithm should be sent to QA to validate its ability to function as intended – instead it should be first tested by the data scientists who are feeding it the test data to make sure it is reaching the same conclusions as predicted.

All data scientists know that you can’t just develop an AI algorithm, feed it relevant data and then move it into production. You have to first verify that the information being used to train the AI is accurate/deep enough to classify data with enough clarity that the predictive results aren’t completely intangible. If you don’t get the predictive results you were expecting in the validation phase, it would be expected that the scientists should go back and rebuild the model with better training data. While this is an element of QA, this part of testing should happen during the training phase of the AI project versus waiting for the AI to go into operation and have to be caught by QA testers.

But that doesn’t mean a QA tester isn’t needed in the product life cycle. Even if an algorithm has been rigorously tested at validation phase, it might still output results that aren’t expected when put into the field. But how is that the case if the validation phase has shown it to work as built? Well, this issue typically arises when the model is fed real-world data that is mismatched against the training data it had previously received. There might be slight differences or nuances in the real-world data that is causing results to deviate from the strict rules it was trained with. This is where the QA tester would be expected to locate and flag the issue so that the AI can go back and be re-trained in order to better sort through and work with the data it is given.

Testing is critical to being able to demonstrate that the decisions made by such a system are fair and are not the result of bias. Even if an AI is learning as it is used in a production situation, the principles of fairness and fair decision making should be transparent and reportable. Systems that are not explainable or carry an unfair bias should fail testing.

Creating an ethical framework

 AI is becoming an integral part of most businesses.  Areas that have traditionally been reserved for human intervention due to high cognitive involvement, such as accessibility engineering, are now open to the technology touch.

But while the AI world might be the traditional domain of a data scientist, devising such an algorithm still requires a cross-functional team who must be fully aware of the requirements on fairness, bias and the laws on privacy, data, and equality – and that means including QA and a testing framework right from the moment of ideation.

Building an ethical framework to enabler testers to assess for bias or adverse impacts right from an AI’s inception onwards is going to be key to teaching the AI to grow and evolve in the best way possible – and therefore essential. Some excellent information from the UK Government and The Alan Turing Institute both mention testing and validation as a crucial activities.

When setting up your AI, it is important to understand how the decisions are going to be made. Is the system accountable? If not, who is? What are the consequences of the decisions? With a robust testing framework to ask and respond to these questions, the company can provide the monitoring and oversight of the AI’s decision-making process. The framework should also include “normal testing” like performance and security, as both can greatly impact the AI’s overall decisions.

Finally, with the AI constantly learning, new versions of itself are regularly being created. As with testing non-AI systems, it is important to re-test every new release. As part of your framework, you must determine how often you will assess the AI – something that can be led by the type of learning that has been applied to the AI model.


Organisations that are looking to implement AI need to build a clear framework from which to work that helps avoid decision-making that could potentially break laws or cause inherent bias in its outcomes. This is why QA testers are a much-needed element in the AI production process, ensuring that any biases within the software are caught at the earliest stage and removed. This step shouldn’t ever be skipped by organisations implementing this type of technological decision-making and QA testers need to be trained internally to locate signs of bias within a data set to avoid it slipping through the cracks.