Task 05- Cyber Attacks and Confusion Matrix

Durga Sankar Sahoo
4 min readJun 6, 2021

In this article, we’ll see cyber crime cases where they talk about confusion matrix or its two types of error.

Cyber crime is defined as a an illegal activity which involves the use of computer or another digital device and network. It is mostly attack on information which is personal and of high importance for individual, organization or government and its exposure can cause serious threats, infrastructure damages, financial loss, and even loss of life.

Thus, detecting various cyber-attacks in a network is very necessary. The application of Machine Learning model in building an effective Intrusion Detection System (IDS) comes into play. A binary classification model can be used to identify what is happening in the network i.e., if there is any attack or not.

Understanding the raw security data is the first step to build an intelligent security model for making predictions about future incidents. The two categories being — normal and anomaly. Take into account the selected security features and performing all pre-processing steps, train the model that can be used to detect whether the test case is normal or an anomaly. For evaluation of model, one of the metric used is Confusion matrix.

Confusion Matrix

It is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

True Positive:

Interpretation: You predicted positive and it’s true.

True Negative:

Interpretation: You predicted negative and it’s true.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

So this would give an idea of what the four boxes in the confusion matrix are representing.

So what makes the confusion matrix so peculiar is the presence and distinction of type 1 and type 2 errors.

Type I error:

This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.

Type II error:

This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.

High accuracy is always the goal be it machine learning or any other field. But the question is does high accuracy always mean better results. Well in most cases the answer is yes but let me give you an example where we might have to go beyond the common notion that we can blindly go towards a higher accuracy.

Ex- Let’s say an anti virus company came with an AI based anti virus that detects all the suspecting files. This model is giving 97 percent accuracy. Let’s say the model is working on your PC and you are there working on the next big thing. You just created an executable script which is very crucial for you but the anti virus being an AI model gave a “FALSE POSITIVE” that your file is a virus.

But on the other hand let’s say that you downloaded a few music videos that might have contained some malicious package but your model was unable to detect it and gave a “FALSE NEGATIVE”.

So now you have a choice. What type of model would you prefer. The mere existence of a choice here means that just accuracy doesn’t suffice the need in some cases because in both these cases the accuracy remained the same.

So you might now have a gist of the importance of the two types of error in confusion matrix and what they mean.

Trade off between type 1 and type 2 error is very critical in cyber security.

Let’s take another example.

Consider a face recognition system which is installed infront of the data warehouse which holds critical error. Consider that the manager comes and the recognition system is unable to recognize him. He tries to log in again and is allowed in.

This seems a pretty normal scenario. But let’s consider another condition. A new person comes and tries to log himself in. The recognition system makes and error and allows him in. Now this is very dangerous. An unauthorized person has made an entry. This could be very damaging to the whole company.

In both the cases there was an error made by the security system. But the tolerance for False Negative here is 0 although we can still bear False Positive.

This shows the critical nature that might vary from use case to use case where we want a trade off between the two types of error.

--

--