[자료] How to build a Linux Automated Malware Analysis Lab

https://www.peerlyst.com/posts/how-to-build-a-linux-automated-malware-analysis-lab-chiheb-chebbi?fbclid=IwAR0A4xVM9-jBgla2jyu-qoTM8RBbCh32ibKRoipyjuTkJC7ukO42flVb7jM


Abstract

Hi Peerlysters ! This article is going to be like a small guide to give you some glimpses, ideas and suggestions to learn how tobuild a Linux automated Analysis lab using built-in Linux commands, python and open source projects.

In this article we are going to discover:

You can find the used code snippets in the following file: Peer-codeSnippets-master

Malware Analysis:

Malware analysis is the art of determining the functionality, origin and potential impact of a given malware sample, such as a viruswormtrojan horserootkit, or backdoor. As a malware analyst, our main role is to collect all the information about the malicious software and have a good understanding about what happened to the infected machines. Like any process, to perform a malware analysis we typically need to follow a certain methodology and a number of steps. To perform Malware Analysis we can go thru three phases:

  • Static Malware Analysis
  • Dynamic Malware Analysis
  • Memory Malware Analysis

To have a clear understanding about these techniques in details i highly recommend you to read the first sections of my Article:

How to bypass Machine Learning Malware Detectors with Generative adversarial Networks” from this link: https://www.peerlyst.com/posts/how-to-bypass-machine-learning-malware-detectors-with-generative-adversarial-networks-chiheb-chebbi?trk=user_notification

Where i discussed many aspects including:

Linux Malware Analysis Sandbox is built for the extraction of the behavior of the malware samples.LMAS is an isolated environment to give malware analysts a huge number of features and capabilities to extract all the necessary informationabout a given malware in an automated way as a final report after analysing the malware using the three techniques described previously(Static,Memory and Dynamic analysis) using open source tools and In-house developed tools in order to provide them a time-efficient malware analysis experience and also to provide us with a dataset that we will use later to build machine learning models.Like that we are not building just another sandbox but a smart detection system that can detect malware and zero-day attacks.

System Architecture

  • Host Machine: As a host machine we used a Kali Linux 2.0 distribution.Kali Linux is a Debian distribution basically dedicated to pentesting. It contains a lot of security tools and packages that help any security expert to achieve many tasks easily.
  • Analysis Virtual Machine:The Sandbox :As an analysis Virtual machine we used Ubuntu 14.04 hosted in a VMWAREworkstation which is well known hypervisor. Ubuntu is a good choice for many reasons.First we need a simple operating system that a huge number of users are using it.Also it could work as a live environment.

Sandbox steps:

As we know malware is a complex piece of software.Its behavior range from basic actions like simple modifications of computer systems to advanced behaviors patterns.In order to detect the required information about a given malware Linux MalwareAnalysis Sandbox proceeds predefined steps.First we feed the sandbox by a Malware binary sample.The system performs a static analysis to know as much as information about the malware before executing it.Then the sample will be transferred to an isolated Malware Analysis virtual machine pre-armed with monitoring tool,execute the malware and intercept behaviors for a defined time on different levels.Later the system acquires a memory image of the analysis machine to perform a memory analysis .Finally all the artifacts are stored in a well defined file format,the artifacts could be represented as an XMLfile,directly viewed in a console window or as a Web dashboard for a better visualization.Simply the sandbox is set up in a way, in which all system interaction of the malware is intercepted. A schematic overview of our sandbox is described in the following Figure

Development Environment

To build the sandbox we used two major scripting languages:

Image result for python logo

Python: comparing to other programming languages we used python for many reasons.It is definitely clearer, simpler, easier to write and hence more maintainable and understandable than the other languages.Python is coming with a huge number of built-in modules.One of them is the OS module which is a great library to interact with the operating system and achieve many related system tasks and especially to automate them.

Image result for bash logo

Bash: is a plain text file containing a chain of Linux commands.These scripts are telling the bash shell what it need to do.So by running a shell script we are not running actually the defined process but we are running a new process.

Static Analysis Modules

1.AV scanning module

Image result for virus total

To obtain additional information about a malware, we used online virus scanners and search for the signature in search engines. VirusTotal is a free virus, malware and URL online scanning service. It will deliver a result from more than 40 antivirussolutions.In order to achieve this we create a script that use VirusTotal API to scan the malware using these antivirus Solutions.

The following is a code snippet you can use it to scan the malware samples:

#!/usr/bin/env python 
import time 
import requests 
import json 
params ={'apikey':'3a6eb41041b884c803b1a06ab24e6bb652a30e8634e1db0156693961f539d1cc'} files = {'file': ('myfile.exe', open('myfile.exe', 'rb'))}
response = requests.post('https://www.virustotal.com/vtapi/v2/file/scan', files=files, params=params) json_response = response.json() 
print(json_response["permalink"]) 
print(json_response["scans"])

2.File identification Module

The main aim of this module is to collect every possible information about the malware.It determine the file type because it give us an idea about the target operating system in addition of the file size, its MD5 Hash and the UNICODE strings because strings could give an insight about some details like URLs and registries as discussed previously.

3.ELF header Module

The Executable and Linking Format was developed as a standard file format.We can see it like PE( Portable Executable) for Windows.ELF is specified by the Linux Standard Base(LSB).There are three main types of objects called ELF: Executable file which is what the Linux kernel can actually run,relocatable file that contain all the required data and code to create a shared object file because we know in order to generate an executable the source code need to be compiled to generate a shared object which is the third format this object will be linked to generate finally an executable file.The ELF Header cite at the beginning of ELF format.It describes the data organization.The role of an ELF header is to make sure that data is correctly interpreted during linking or execution.We need information about ELF headers in malware analysis to better understand how the ELF file works.

Image result for ELF file

4.Malware Dependencies Module

This module is responsible for revealing Malware binary dependencies because linked libraries and dependencies are very important indicators about the malware functionality

You can use the following code snippet to get static information about the binary: (The imported Libraries also will be used in the following code snippets)

#!/usr/bin/env python
import sys,os
import hashlib 
import time
import requests
import json
import yara
from termcolor  import colored
from optparse import OptionParser
print colored("[OK]","green"),"Static Malware Analysis"     
print colored("[OK]","green"),"Starting Information Gathering about the File ..."    
print colored("[OK] File Information","yellow")     
os.system("file elf")
#MD5 Hash     
print colored("[OK] MD5 Hash","yellow")     
print(hashlib.md5(open('elf','rb').read()).hexdigest())     
#print colored("Strings","yellow")     
#os.system("strings elf") 
#ELF Headers Information     
print colored("[OK] ELF Headers:","yellow")     
os.system("readelf -h  elf") 
#Online Scanning         
print colored("[OK] Malware Scanning:","yellow") 
# Scanning Result     
print colored("Number of Antivirus scanners: "+ str(json_response1["total"]),"cyan") print colored("Scan Date: "+ str(json_response1["scan_date"]),"cyan")         
print colored("Scan ID: "+ str(json_response1["scan_id"]),"cyan")         
print colored("Live Scanners: "+ str(json_response1["scans"]),"cyan")

By now we created a small static analysis script when you can identify:

  • File format
  • Size
  • Operating system
  • MD5 Hash
  • ELF headers
  • Online scanning from 63 different vendors (Kspersky, Avast etc...)

Dynamic Analysis Module

The dynamic analysis module is responsible for tracing every behavior of the malware sample in his execution mode in the isolated virtual analysis machine.This model intercept all the machine processes,syscalls (calls between the userspace and the kernel space), signals and many other activities and take screenshots to put it later in the final report. The model also trace all the malware networking activities like DNS summary,TCP conversations and dump all the packet captures

You can extract the information used build-in Linux commands: The following is a code snipped to use Inetsim

#!/usr/bin/env python 
import os,sys 
import time 
import socket 
from termcolor  import colored 
bannerDynamic = """  |      Dynamic Analysis    | """ 
print bannerDynamic 
print colored("[OK]","green"),"Dynamic Malware Analysis" 
print colored("[OK]","green"),"Collecting network information and system behaviors ..." 
print "Dynamic Linux Malware Analysis" 
#Check Internet connexion 
REMOTE_SERVER = "www.peerlyst.com" 
def is_connected():
   try:     
# seStarting Virtual Machine - Sandbox‍ 
# a DNS listening     
host = socket.gethostbyname(REMOTE_SERVER)     
# connect to the host -- tells us if the host is actually     # reachable     
s = socket.create_connection((host, 80), 2)     
#Starting Virtual Machine - Sandbox     
print ("sandbox is working ...")     
# transfer file to vm     print ("Done!")         
return "Internet"   except:      pass      
print "[ OK ]cleaning Inetsim logs "      
#os.remove("/home/ghost/Desktop/MalwareSandbox/elf")      
print "[ OK ]starting inetsim"      
#Loading: waiting for services to start      """      
for i in range(100):         
   time.sleep(0.1)         
   sys.stdout.write("\r%d%%" % i)         
sys.stdout.flush()      """      
#os.system("sudo inetsim")      
return "False  There is no internet " 
print is_connected() 
#starting capturing Network packets #os.system("sudo tcpdump -qn")


Memory Analysis Module

The main goal of the memory analysis module is to analyze the memory dump image captured after the dynamic analysis phase.It is able to analyze both operating systems Windows and Linux but in our case we are using a Linux profile.This model will analyze every memory detail and will display:

  • Process list and the associated threads
  • Networking information and interfaces (TCP/UDP)
  • Kernel modules including the hidden modules
  • Opened files in the kernel
  • Bash and commands history
  • System Calls
  • Kernel hooks

To do a memory forensics i suggest to use Volatility framework: the following is a code snippet to call Volatility commands

Image result for volatility framework

#!/usr/bin/env python 
import os,sys
from termcolor  import colored
bannerMemory =  |                      Memory Analysis                |
print bannerMemory
print colored("[OK]","green"),"Memory Malware Analysis"
print colored("[OK]","green"),"Collecting Memory Dumps ..."
print "Memory Linux Malware Analysis"
#configure  the PATH and the Profile 
#VolaPATH = "volatility --info " 
#os.system(VolaPATH + "| grep linux")
#os.system(VolaPATH + "| grep windows")
print ("[ OK ] pslist")
os.system(VolaPATH + "pslist")
print ("[ OK ] pstree")
os.system(VolaPATH + "pstree")
print ("[ OK ] pidhashtable")
os.system(VolaPATH + "pidhashtable")
print ("[ OK ] psaux")
os.system(VolaPATH + "psaux")

print ("[ OK ] psenv")
os.system(VolaPATH + "psenv")
print ("[ OK ] Threads")
os.system(VolaPATH + "threads")
print ("[ OK ] netstat")
os.system(VolaPATH + "netstat")
print ("[ OK ] Ifconfig")
os.system(VolaPATH + "ifconfig")
print ("[ OK ] TList_raw")
os.system(VolaPATH + "list_raw")
print ("[ OK ] Library List")
os.system(VolaPATH + "library_list")
print ("[ OK ] Kernel Opened Files")
os.system(VolaPATH + "kernel_opened_files")



Additional Code Snippets used in the sandbox:

To Create a Main Interface you can use the following code snippet:

#!/usr/bin/env python 
import os,sys 
from optparse import OptionParser 
#Args number verification if len(sys.argv) <=1:     
   print("Please give some arguments or type help!")     
   sys.exit() 

parser = OptionParser('Usage: %prog [Options][args]') 
parser.add_option("-t", "--timeout", dest="timeout", help="timeout in seconds", default="False") 
parser.add_option("-s", "--static", action="store_true", dest="static", help = "Static Malware Analysis",  default=False) 
parser.add_option("-d", "--dynamic", action="store_true", dest="dynamic", help="Dynamic Malware Analysis",  default=False) 

(options, args) = parser.parse_args() 

timeout = options.timeout 
static = options.static 
dynamic = options.dynamic 
if static:     
   print("This is a static Malware Analysis") #Add the static script here
elif dynamic:     
   print("This is a dynamic Malware Analysis")  #Add the dynamic script here


Internet verification

#!/usr/bin/env python
import socket
REMOTE_SERVER = "www.peerlyst.com"
def is_connected():
  try:
    # see if we can resolve the host name -- tells us if there is
    # a DNS listening
    host = socket.gethostbyname(REMOTE_SERVER)
    # connect to the host -- tells us if the host is actually
    # reachable
    s = socket.create_connection((host, 80), 2)
    return "Internet"
  except:
     pass
     return False
print is_connected()

Loading Bars

#!/usr/bin/env python 
import os
import sys
import time 
#Loading Bars and progress status
#using Percentages

"""for i in range(100):
    time.sleep(0.5)
    sys.stdout.write("\r%d%%" % i)
    sys.stdout.flush()
"""
print ("Task in progress Here ...")    
def spinning_cursor():
    while True:
        for cursor in '|/-\\':
            yield cursor

spinner = spinning_cursor()
for _ in range(50):

    sys.stdout.write(spinner.next())
    sys.stdout.flush()
    time.sleep(0.1)
    sys.stdout.write('\b')

To Take Screenshots:

#!/usr/bin/env python 
import os import sys 
import pyscreenshot as ImageGrab 
print ("taking Screenshots") 
# Pip install pyscreenshot 
#Take Screenshots for the sandbox and Save them 
im = ImageGrab.grab() 
ImageGrab.grab_to_file('im1.png')

Visualization

The resulted report from the overall analysis techniques come with many visualization choices.First We simply can generate the report directly from the sandbox console window which is a good alternative especially for system administrators and security analysts who love the console environment.Another output alternative available is a Web based Dashboard for those who feel more comfortable with graphical friendly user interfaces.As a Front-end solution we used a bootstrap dashboard because it is a great way to build web pages in a flexible and easy way in one hand and give as the capabilities to create amazing graphical web pages especially Dashboards.

Done Features:

Using the previous code snippets you are able to build a sandbox that is able to:

  • Identify the file type
  • Identify the size of the malware sample
  • Identify its MD5 Hash
  • Scan it online using 63 Different AVs
  • Get networking informations using built-in Linux commands
  • Take screenshots and save them
  • Analyze the acquired memory dump

Post updates:

04/4/2018: VirusTotal Online scanner updated by Author

04/4/2018: Static Analysis code updated by Author

04/4/2018: Memory Analysis code updated by Author

04/4/2018: Screenshots code updated by Author

References:

[1] Limon - Sandbox for Analyzing Linux Malware https://github.com/monnappa22/Limon

Summary

In this article we gave a glance at our personalized sandbox that have the majority of features presented in many other well known malware analysis sandbox.