Software Testing is a process of verifying and validating whether a software application is Bug-Free and built according to the requirements. Machine Learning, on the other hand, is an area where the algorithms receive the data and use statistical analysis to predict an output while updating it when the new data is available.
Machine Learning in Software Applications
As early as the software industry started using Machine Learning actively and incorporated the models into their applications, few steps related to the SDLC process got exciting and, one such area was Quality Assurance or Software Testing.
Well, there are 2 ways of using Machine Learning in Software Applications: Either testing an application having Machine Learning Models embedded into it or using Machine learning in Software Testing. In this blog, we shall discuss the first case, i.e., testing the models embedded in software applications.
Generally, in software testing, we keep a record of expected and actual results. But, in a machine learning system, the true performance of a model is known only when we test the model against real-world data.
Ever wondered what does a typical Data Scientist do?
More generally, a data scientist knows how to extract meaning from and interpret data. From dividing data into – Train and Test, to training and testing the model on training and testing data respectively, a data scientist tackles everything.
Few other tasks that a data scientist handles include:
Tuning the models to get the best results.
Making sure that the results on the test data do not deviate much.
Also, to train a model the Quality Assurance Engineer should know how to train a Machine Learning Model.
Without the knowledge of Machine Learning, one can use a testing technique called Metamorphic Testing.
Metamorphic testing is a software testing technique to counter the test oracle problem (not knowing the expected output for a test case). In this technique, even if we do not know the expected output, we can relate the outputs of the multiple inputs provided as there is a relation between the inputs.
Example of Metamorphic testing:
Suppose we have built a model to predict whether a person is suffering from a cold or not using different predictors like age, weather, location, population density, pollution levels in that area, etc. Now the input record displays age as less than 5 years, less population density, less pollution, and somewhat cold conditions. In this case, the output will be negative meaning that the child does not have a cold.
Based on the detailed analysis, if the temperature drops by 2 degrees and population density increase by 2 times, the risk of getting cold will go up by 8%. We can, therefore, perform metamorphic testing by considering temperature and population density as the factors.
In metamorphic testing, the successful test cases result in a new test case.
In the case of a 5-year-old child, a little bit of cold weather determines the likely hood of a child getting the cold.
Upon decreasing the age by a year and temperature by 2 degrees, the likelihood of getting cold increases by 10%.
Such test cases can be tested until we get a failure.
Sample steps to update the ML models in the application on a regular basis:
While integrating the Machine Learning algorithms into an application, a robust architecture needs to be in place to do the following:
Train, tune and then test the models.
Place the desired model into the application against the real data.
Over a period of time, once the real data becomes the past data, add this data to the training data.
Repeat the above steps.
Integration and System Testing needs to be taken care of:
Apart from the above-mentioned points, when we include a new component into the software application, integration testing also needs to be done. The reason being that one needs to pipe the data to the Machine learning model and collect the output and use it in a different module.
For example, for a bank to give loan to a user, they first ask the user to fill in their details on the webpage. In the framework, one can apply multiple machine learning models to evaluate whether the user is eligible for a loan or not. If he is eligible, then what is the tentative amount he is eligible for. Similarly, we can make numerous cases like these.
In the above scenario, we are taking data from the web server and feeding it to the machine model. Once the model is applied, we are taking the result back to the web server.
The key point is, one needs to understand:
The requirements for the production results and limitations of the algorithms.
Assess the results using the statistical methods for the machine learning models.
Be careful with the application architecture changes and take care of integration and system testing while introducing a new model.