Options
Detection of Malware using Machine Learning based on Operation Code Frequency
Date Issued
27-07-2021
Author(s)
Mohandas, Pavitra
Kumar, Sudesh Kumar Santhosh
Kulyadi, Sandeep Pai
Shankar Raman, M. J.
Vasan, V. S.
Venkataswami, Balaji
Abstract
One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.