Abstract
Hi Peerlysters ! This article is going to be like a small guide to give you some glimpses, ideas and suggestions to learn how tobuild a Linux automated Analysis lab using built-in Linux commands, python and open source projects.
In this article we are going to discover:
- Malware analysis techniques
- System Architecture
- Linux Automated Malware Analysis Sandbox steps
You can find the used code snippets in the following file: Peer-codeSnippets-master
Malware Analysis:
Malware analysis is the art of determining the functionality, origin and potential impact of a given malware sample, such as a virus, worm, trojan horse, rootkit, or backdoor. As a malware analyst, our main role is to collect all the information about the malicious software and have a good understanding about what happened to the infected machines. Like any process, to perform a malware analysis we typically need to follow a certain methodology and a number of steps. To perform Malware Analysis we can go thru three phases:
- Static Malware Analysis
- Dynamic Malware Analysis
- Memory Malware Analysis
To have a clear understanding about these techniques in details i highly recommend you to read the first sections of my Article:
“How to bypass Machine Learning Malware Detectors with Generative adversarial Networks” from this link: https://www.peerlyst.com/posts/how-to-bypass-machine-learning-malware-detectors-with-generative-adversarial-networks-chiheb-chebbi?trk=user_notification
Where i discussed many aspects including:
- Malware fundamentals
- Malware Distribution
- Classical AV evasion techniques
- Malware analysis techniques
- Machine learning malware detection
- Machine Learning Threat Model
- Bypassing Machine Learning malware detectors using Generative Adversarial Networks (GANs)
Linux Malware Analysis Sandbox is built for the extraction of the behavior of the malware samples.LMAS is an isolated environment to give malware analysts a huge number of features and capabilities to extract all the necessary informationabout a given malware in an automated way as a final report after analysing the malware using the three techniques described previously(Static,Memory and Dynamic analysis) using open source tools and In-house developed tools in order to provide them a time-efficient malware analysis experience and also to provide us with a dataset that we will use later to build machine learning models.Like that we are not building just another sandbox but a smart detection system that can detect malware and zero-day attacks.
System Architecture
- Host Machine: As a host machine we used a Kali Linux 2.0 distribution.Kali Linux is a Debian distribution basically dedicated to pentesting. It contains a lot of security tools and packages that help any security expert to achieve many tasks easily.
- Analysis Virtual Machine:The Sandbox :As an analysis Virtual machine we used Ubuntu 14.04 hosted in a VMWAREworkstation which is well known hypervisor. Ubuntu is a good choice for many reasons.First we need a simple operating system that a huge number of users are using it.Also it could work as a live environment.
Sandbox steps:
As we know malware is a complex piece of software.Its behavior range from basic actions like simple modifications of computer systems to advanced behaviors patterns.In order to detect the required information about a given malware Linux MalwareAnalysis Sandbox proceeds predefined steps.First we feed the sandbox by a Malware binary sample.The system performs a static analysis to know as much as information about the malware before executing it.Then the sample will be transferred to an isolated Malware Analysis virtual machine pre-armed with monitoring tool,execute the malware and intercept behaviors for a defined time on different levels.Later the system acquires a memory image of the analysis machine to perform a memory analysis .Finally all the artifacts are stored in a well defined file format,the artifacts could be represented as an XMLfile,directly viewed in a console window or as a Web dashboard for a better visualization.Simply the sandbox is set up in a way, in which all system interaction of the malware is intercepted. A schematic overview of our sandbox is described in the following Figure
Development Environment
To build the sandbox we used two major scripting languages:
Python: comparing to other programming languages we used python for many reasons.It is definitely clearer, simpler, easier to write and hence more maintainable and understandable than the other languages.Python is coming with a huge number of built-in modules.One of them is the OS module which is a great library to interact with the operating system and achieve many related system tasks and especially to automate them.
Bash: is a plain text file containing a chain of Linux commands.These scripts are telling the bash shell what it need to do.So by running a shell script we are not running actually the defined process but we are running a new process.
Static Analysis Modules
1.AV scanning module
To obtain additional information about a malware, we used online virus scanners and search for the signature in search engines. VirusTotal is a free virus, malware and URL online scanning service. It will deliver a result from more than 40 antivirussolutions.In order to achieve this we create a script that use VirusTotal API to scan the malware using these antivirus Solutions.
The following is a code snippet you can use it to scan the malware samples:
#!/usr/bin/env python import time import requests import json params ={'apikey':'3a6eb41041b884c803b1a06ab24e6bb652a30e8634e1db0156693961f539d1cc'} files = {'file': ('myfile.exe', open('myfile.exe', 'rb'))} response = requests.post('https://www.virustotal.com/vtapi/v2/file/scan', files=files, params=params) json_response = response.json() print(json_response["permalink"]) print(json_response["scans"])
2.File identification Module
The main aim of this module is to collect every possible information about the malware.It determine the file type because it give us an idea about the target operating system in addition of the file size, its MD5 Hash and the UNICODE strings because strings could give an insight about some details like URLs and registries as discussed previously.
3.ELF header Module
The Executable and Linking Format was developed as a standard file format.We can see it like PE( Portable Executable) for Windows.ELF is specified by the Linux Standard Base(LSB).There are three main types of objects called ELF: Executable file which is what the Linux kernel can actually run,relocatable file that contain all the required data and code to create a shared object file because we know in order to generate an executable the source code need to be compiled to generate a shared object which is the third format this object will be linked to generate finally an executable file.The ELF Header cite at the beginning of ELF format.It describes the data organization.The role of an ELF header is to make sure that data is correctly interpreted during linking or execution.We need information about ELF headers in malware analysis to better understand how the ELF file works.
4.Malware Dependencies Module
This module is responsible for revealing Malware binary dependencies because linked libraries and dependencies are very important indicators about the malware functionality
You can use the following code snippet to get static information about the binary: (The imported Libraries also will be used in the following code snippets)
#!/usr/bin/env python import sys,os import hashlib import time import requests import json import yara from termcolor import colored from optparse import OptionParser
print colored("[OK]","green"),"Static Malware Analysis" print colored("[OK]","green"),"Starting Information Gathering about the File ..." print colored("[OK] File Information","yellow") os.system("file elf") #MD5 Hash print colored("[OK] MD5 Hash","yellow") print(hashlib.md5(open('elf','rb').read()).hexdigest()) #print colored("Strings","yellow") #os.system("strings elf") #ELF Headers Information print colored("[OK] ELF Headers:","yellow") os.system("readelf -h elf") #Online Scanning print colored("[OK] Malware Scanning:","yellow") # Scanning Result print colored("Number of Antivirus scanners: "+ str(json_response1["total"]),"cyan") print colored("Scan Date: "+ str(json_response1["scan_date"]),"cyan") print colored("Scan ID: "+ str(json_response1["scan_id"]),"cyan") print colored("Live Scanners: "+ str(json_response1["scans"]),"cyan")
By now we created a small static analysis script when you can identify:
- File format
- Size
- Operating system
- MD5 Hash
- ELF headers
- Online scanning from 63 different vendors (Kspersky, Avast etc...)
Dynamic Analysis Module
The dynamic analysis module is responsible for tracing every behavior of the malware sample in his execution mode in the isolated virtual analysis machine.This model intercept all the machine processes,syscalls (calls between the userspace and the kernel space), signals and many other activities and take screenshots to put it later in the final report. The model also trace all the malware networking activities like DNS summary,TCP conversations and dump all the packet captures
You can extract the information used build-in Linux commands: The following is a code snipped to use Inetsim
#!/usr/bin/env python import os,sys import time import socket from termcolor import colored bannerDynamic = """ | Dynamic Analysis | """ print bannerDynamic print colored("[OK]","green"),"Dynamic Malware Analysis" print colored("[OK]","green"),"Collecting network information and system behaviors ..." print "Dynamic Linux Malware Analysis" #Check Internet connexion REMOTE_SERVER = "www.peerlyst.com" def is_connected(): try: # seStarting Virtual Machine - Sandbox # a DNS listening host = socket.gethostbyname(REMOTE_SERVER) # connect to the host -- tells us if the host is actually # reachable s = socket.create_connection((host, 80), 2) #Starting Virtual Machine - Sandbox print ("sandbox is working ...") # transfer file to vm print ("Done!") return "Internet" except: pass print "[ OK ]cleaning Inetsim logs " #os.remove("/home/ghost/Desktop/MalwareSandbox/elf") print "[ OK ]starting inetsim" #Loading: waiting for services to start """ for i in range(100): time.sleep(0.1) sys.stdout.write("\r%d%%" % i) sys.stdout.flush() """ #os.system("sudo inetsim") return "False There is no internet " print is_connected() #starting capturing Network packets #os.system("sudo tcpdump -qn")
Memory Analysis Module
The main goal of the memory analysis module is to analyze the memory dump image captured after the dynamic analysis phase.It is able to analyze both operating systems Windows and Linux but in our case we are using a Linux profile.This model will analyze every memory detail and will display:
- Process list and the associated threads
- Networking information and interfaces (TCP/UDP)
- Kernel modules including the hidden modules
- Opened files in the kernel
- Bash and commands history
- System Calls
- Kernel hooks
To do a memory forensics i suggest to use Volatility framework: the following is a code snippet to call Volatility commands
#!/usr/bin/env python import os,sys from termcolor import colored bannerMemory = | Memory Analysis | print bannerMemory print colored("[OK]","green"),"Memory Malware Analysis" print colored("[OK]","green"),"Collecting Memory Dumps ..." print "Memory Linux Malware Analysis" #configure the PATH and the Profile #VolaPATH = "volatility --info " #os.system(VolaPATH + "| grep linux") #os.system(VolaPATH + "| grep windows") print ("[ OK ] pslist") os.system(VolaPATH + "pslist") print ("[ OK ] pstree") os.system(VolaPATH + "pstree") print ("[ OK ] pidhashtable") os.system(VolaPATH + "pidhashtable") print ("[ OK ] psaux") os.system(VolaPATH + "psaux") print ("[ OK ] psenv") os.system(VolaPATH + "psenv") print ("[ OK ] Threads") os.system(VolaPATH + "threads") print ("[ OK ] netstat") os.system(VolaPATH + "netstat") print ("[ OK ] Ifconfig") os.system(VolaPATH + "ifconfig") print ("[ OK ] TList_raw") os.system(VolaPATH + "list_raw") print ("[ OK ] Library List") os.system(VolaPATH + "library_list") print ("[ OK ] Kernel Opened Files") os.system(VolaPATH + "kernel_opened_files")
Additional Code Snippets used in the sandbox:
To Create a Main Interface you can use the following code snippet:
#!/usr/bin/env python import os,sys from optparse import OptionParser #Args number verification if len(sys.argv) <=1: print("Please give some arguments or type help!") sys.exit() parser = OptionParser('Usage: %prog [Options][args]') parser.add_option("-t", "--timeout", dest="timeout", help="timeout in seconds", default="False") parser.add_option("-s", "--static", action="store_true", dest="static", help = "Static Malware Analysis", default=False) parser.add_option("-d", "--dynamic", action="store_true", dest="dynamic", help="Dynamic Malware Analysis", default=False) (options, args) = parser.parse_args() timeout = options.timeout static = options.static dynamic = options.dynamic if static: print("This is a static Malware Analysis") #Add the static script here elif dynamic: print("This is a dynamic Malware Analysis") #Add the dynamic script here
Internet verification
#!/usr/bin/env python import socket REMOTE_SERVER = "www.peerlyst.com" def is_connected(): try: # see if we can resolve the host name -- tells us if there is # a DNS listening host = socket.gethostbyname(REMOTE_SERVER) # connect to the host -- tells us if the host is actually # reachable s = socket.create_connection((host, 80), 2) return "Internet" except: pass return False print is_connected()
Loading Bars
#!/usr/bin/env python import os import sys import time #Loading Bars and progress status #using Percentages """for i in range(100): time.sleep(0.5) sys.stdout.write("\r%d%%" % i) sys.stdout.flush() """ print ("Task in progress Here ...") def spinning_cursor(): while True: for cursor in '|/-\\': yield cursor spinner = spinning_cursor() for _ in range(50): sys.stdout.write(spinner.next()) sys.stdout.flush() time.sleep(0.1) sys.stdout.write('\b')
To Take Screenshots:
#!/usr/bin/env python
import os import sys
import pyscreenshot as ImageGrab
print ("taking Screenshots")
# Pip install pyscreenshot
#Take Screenshots for the sandbox and Save them
im = ImageGrab.grab()
ImageGrab.grab_to_file('im1.png')
Visualization
The resulted report from the overall analysis techniques come with many visualization choices.First We simply can generate the report directly from the sandbox console window which is a good alternative especially for system administrators and security analysts who love the console environment.Another output alternative available is a Web based Dashboard for those who feel more comfortable with graphical friendly user interfaces.As a Front-end solution we used a bootstrap dashboard because it is a great way to build web pages in a flexible and easy way in one hand and give as the capabilities to create amazing graphical web pages especially Dashboards.
Done Features:
Using the previous code snippets you are able to build a sandbox that is able to:
- Identify the file type
- Identify the size of the malware sample
- Identify its MD5 Hash
- Scan it online using 63 Different AVs
- Get networking informations using built-in Linux commands
- Take screenshots and save them
- Analyze the acquired memory dump
Post updates:
04/4/2018: VirusTotal Online scanner updated by Author
04/4/2018: Static Analysis code updated by Author
04/4/2018: Memory Analysis code updated by Author
04/4/2018: Screenshots code updated by Author
References:
[1] Limon - Sandbox for Analyzing Linux Malware https://github.com/monnappa22/Limon
Summary
In this article we gave a glance at our personalized sandbox that have the majority of features presented in many other well known malware analysis sandbox.
'old > Reversing' 카테고리의 다른 글
[자료] 악성코드 샘플 (0) | 2018.09.27 |
---|---|
[자료] A Technical Survey Of Common And Trending Process Injection Techniques (0) | 2018.09.15 |
[자료] Mobile Security Reading Room (0) | 2018.02.13 |
[자료] 안드로이드 리버스 엔지니어링 (0) | 2017.10.05 |
[자료] 윈도우 기반 분석 환경 (0) | 2017.07.31 |