Friday, August 18, 2017

Updates, New Stuff

Specificity
The folks at Talos recently posted an interesting article, "On Conveying Doubt".  While some have said that this article discusses "conveying doubt" (which it does), it's really about specificity of language.  Too often in our industry, while there is clarity of thought, there's a lack of specificity in the language we use to convey those thoughts; this includes all manner of communication methods; not only reports, but presentations and blog posts.  After all, it doesn't matter how good you or your analysis may be if you cannot communicate your methodology and findings.

Ransomware
Ransomware is a pretty significant issue in the IT community, particularly over the past 8 months or so.  My first real encounter with ransomware, from a DFIR perspective, started last year with a couple of Samas ransomware cases that came in; from a perspective external to my employer, this resulted in two corporate blog posts, one of which was written by a colleague/friend, and ended up being one of the most quoted and referenced blog posts published.  Interestingly enough, a lot of aspects about ransomware, in general, have continued to evolve since then, at an alarming rate.

Vitali Kremez recently posted an interesting analysis of the GlobeImposter ".726" ransomware.  From an IR perspective, where one has to work directly with clients, output from IDAPro and OllyDbg isn't entirely useful in most cases.

However, in this case, there are some interesting aspects of the ransomware that Vitali shared, specifically in section VI.(b).; that is, the command line that the ransomware executes.

Before I go any further, refer back to Kevin's blog post (linked above) regarding the evolution of the Samas ransomware.  At first, the ransomware included a copy of a secure deletion tool in one of it's resource sections, but later versions opted to reduce the overall size of the executable.  Apparently, the GlobeImposter ransomware does something similar, relying on tools native to the operating system to run commands that 'clean up' behind itself.  From the embedded batch file, we can see that it deletes VSCs, deletes user's Registry keys related to lateral movement via RDP, and then enumerates and clears *all* Windows Event Logs.  The batch file also includes the use of "taskill" to stop a number of processes, which is interesting, as several of them (i.e., Outlook, Excel, Word) would immediately alert the user to an issue.

FortiNet recently published an analysis of a similar variant.

Online research (including following #globeimposter on Twitter) indicates that the ransomware is JavaScript-based, and that the IIV is via spam email (i.e., "malspam").  If that's the case (and I'm not suggesting that it isn't), why does the embedded command line include deleting Registry keys and files associated with lateral movement via RDP?

Marco Ramilli took a look at some RaaS ransomware that was apparently distributed as an email attachment.  He refers to it as "TOPransomware", due to a misspelling in the ransom note instructions that tell the victim to download a TOP browser, as opposed to a TOR browser.  This is interesting, as some of Samas variants I've seen this year include a clear-text misspelling in the HTML ransom note page, setting the font color to "DrakRed".

I also ran across this analysis of the Shade ransomware, and aside from the analysis itself, I thought that a couple of comments in the post were very interesting.  First, "...has been a prominent strand from around 2014."  Okay, this one is new to me, but that doesn't really say much.  After all, as a consultant, my "keyhole" view is based largely on the clients who call us for assistance.  I don't claim to have seen everything and know everything...quite the opposite, in fact.  But this does illustrate the differences in perspective.

Second, "...spread like many other ransomware through email attachments..."; this is true, many other ransomware variants are spread as email attachments.  The TOPransomware mentioned previously was apparently discovered as a .vbs script attached to an email.  However, it is important to note that NOT ALL ransomware gets in via email.  This is an important distinction, as all of the protection mechanisms that you'd employ against email-borne ransomware attacks are completely useless against variants such as Samas and LeChiffre, which do not propagate via email.  My point is, if you purchase a solution that only protects you from email-borne attacks, you're still potentially at risk for other ransomware attacks.  Further, I've also seen where solutions meant to protect against email-borne ransomware attacks do not work when a user uses Chrome to access their AOL email, and the system/infrastructure gets infected that way.

On August 7, authorities in the Ukraine arrested a man for distributing the NotPetya malware (not ransomware, I know...) in order to assist tax evaders.  According to the article, he isn't thought to be the author of the destructive malware, but was instead found to be distributing a copy of the malware via his social media account so that companies could presumably infect themselves and not pay taxes.

Recently, this Mamba ransomware analysis was posted; more than anything else, this really highlights one of the visible gaps in this sort of analysis.  As the authors found the ransomware through online searches (i.e., OSINT), there's no information as to how this ransomware is distributed.  Is it as an email attachment?  What sort?  Or, is it deployed via more manual means, as has been observed with the Samas and LeChiffre ransomware?

The Grugq recently posted thoughts as to how ransomware has "changed the rules".  I started in the industry, specifically in the private sector, 20 yrs ago this month...and I have to say, many/most of the recommendations as to how to protect yourself against and recover from ransomware were recommendations from that time, as well.  What's old is new again, eh?

Cryptocurrency Miners
Cylance recently posted regarding cryptocurrency mining malware.  Figure 7b of the post provides some very useful information in the way of a high fidelity indicator..."stratum+tcp".  If you're monitoring process creation on endpoints and threat hunting, looking for this will provide an indicator on which you can pivot.  Clearly, you'll want to determine how the mining software got there, and performing that root cause analysis (RCA) will direct you to their access method.

Web Shells
The PAN guys had a fascinating write-up on the TwoFace web shell, which includes threat actor TTPs.  In the work I've done, I've most often found web shells on perimeter systems, and that initial access was exploited to move laterally to other systems within the network infrastructure.  I did, however, once see a web shell placed on an internal web server; that web shell was then used to move laterally to an MS SQL server.

Speaking of web shells, there's a fascinating little PHP web shell right here that you might want to check out.

Lifer
Retired cop Paul Tew recently released a tool he wrote called "lifer", which parses LNK files.  One of the things that makes tools like this really useful is use cases; Paul comes from an LE background, so he very likely has a completely different usage for such tools, but for me, I've used my own version of such tools to parse LNK files reportedly sent to victims of spam campaigns.

Want to generate your own malicious LNK files?  Give LNKup a try...

Supply Chain Compromises
Supply chain compromises are those in which a supplier or vendor is compromised in order to reach a target.  We've seen compromises such as this for some time, so it's nothing new.  Perhaps the most public supply chain compromise was Target.

More recently, we've recently seen the NotPetya malware, often incorrectly described as "ransomware".  Okay, my last reference to ransomware (for now...honest) comes from iTWire, in which Maersk was able to estimate the cost of a "ransomware" attack.  I didn't include this article in the Ransomware section above because if you read the article, you'll see that it refers to NotPetya.

Here is an ArsTechnica article that discusses another supply chain compromise, where a backdoor was added to a legitimate server-/network-management product.  What's interesting about the article is that it states that covert data collection could have occurred, which I'm sure is true.  The questions are, did it, and how would you detect something like this in your infrastructure?

Carving for EVTX
Speaking of NotPetya, here's an interesting article about carving for EVTX records that came out just days before NotPetya made it's rounds.  If you remember, NotPetya included a command line that cleared several Windows Event Log files

The folks who wrote the post also included a link to their GitHub site for the carver they developed, which includes not only the source code, but pre-compiled binaries (they're written in Go).  The site includes a carver (evtxdump) as well as a tool to monitor Windows Event Logs (evtxmon).

Wednesday, August 02, 2017

Document Metadata

Okay, yeah, so I've been blogging a lot over the past couple of months about extracting document metadata as part of gathering threat intelligence.


This handler diary provided analysis of "malspam pushing Emotet", and this follow up post illustrated how to conduct static analysis of the document itself.  I have used several of the tools mentioned, but had not yet heard of "vipermonkey", and open-source VBA emulator.  Used in conjunction with oledump.py, you can really get a lot of traction with respect to static analysis of the malicious document.

While the second handler diary post focuses on analysis of the malicious macro, what neither post does is illustrate the document metadata. Below is the output of wmd.pl, run against a sample downloaded from VT:

C:\Perl>wmd.pl d:\cases\maldoc\maldoc
--------------------
Statistics
--------------------
File    = d:\cases\maldoc\maldoc
Size    = 215040 bytes
Magic   = 0xa5ec (Word 8.0)
Version = 193
LangID  = Russian

Document has picture(s).

Document was created on Windows.

Magic Created : MS Word 97
Magic Revised : MS Word 97

--------------------
Summary Information
--------------------
Title        : sdf
Subject      : df
Authress     : admin
LastAuth     : admin
RevNum       : 2
AppName      : Microsoft Office Word
Created      : 26.07.2017, 11:51:00
Last Saved   : 26.07.2017, 11:51:00
Last Printed :

--------------------
Document Summary Information
--------------------
Organization : home

...and from oledmp.pl:

C:\Perl>oledmp.pl -f d:\cases\maldoc\maldoc -l
Root Entry  Date: 26.07.2017, 11:51:59  CLSID: 00020906-0000-0000-C000-000000000046
    1 F..   55949                      \Data
    2 F..    7359                      \1Table
    3 F..    4148                      \WordDocument
    4 F.T    4096                      \ SummaryInformation
    5 F.T    4096                      \ DocumentSummaryInformation
    6 D..       0 26.07.2017, 11:51:59 \Macros
    7 D..       0 26.07.2017, 11:51:59 \Macros\VBA
    8 FM.   88908                      \Macros\VBA\ThisDocument
    9 F..     532                      \Macros\VBA\__SRP_2
   10 F..     156                      \Macros\VBA\__SRP_3
   11 FM.    8137                      \Macros\VBA\zjUb2S
   12 FM.    8877                      \Macros\VBA\cvDTF
   13 FM.    4906                      \Macros\VBA\FX9UL
   14 F..   15451                      \Macros\VBA\_VBA_PROJECT
   15 F..     739                      \Macros\VBA\dir
   16 F..    1976                      \Macros\VBA\__SRP_0
   17 F..     198                      \Macros\VBA\__SRP_1
   18 F..      98                      \Macros\PROJECTwm
   19 F..     476                      \Macros\PROJECT
   20 F.T     114                      \ CompObj

We can see that the dates displayed by both tools line up, and we can use oledmp.pl to further list the contents (raw, or hex) of the various streams.  

So, how can any of this be of value, and why does any of this matter?  Well, at BlackHat last week, Allison Wikoff spent a great deal of her time being interviewed about some really fantastic research that she'd conducted on the "Mia Ash" persona (here is the original SecureWorks posting of the results of her research).

From the Wired article:
Eventually, Ash sent the staffer an email with a Microsoft Excel attachment for a photography survey. She asked him to open it on his office network, telling him that it would work best there. After a month of trust-building conversation, he did as he was told. The attachment promptly launched a malicious macro...

So, this really illustrates the dedication of these threat actors...they establish a persona, including social media "pocket litter", and spend time developing a relationship with their target.  As a very small part of her research, Allison took a look at the metadata embedded within the Excel spreadsheet, and found that the user information referred to "Mia Ash".  This further illustrated the depths to which the threat actors would go in order to make the persona appear authentic; not only did they populate multiple social media sites and create a "history" for the persona, but they also ensured that the metadata in the documents sent to intended victims included the 'right' contents to support the persona.  That's right, it's exactly the way it sounds...the metadata embedded in the spreadsheet specifically referred to "Mia Ash" as the authorized user of the MS Office products.

I know what you're going to say..."yeah, but that stuff can be changed/modified...".  Yes, it can...but the point is, how often is that actually done?  Look at the above listed output from wmd.pl...does it look as if any effort was put into modifying the metadata that populated the Word97 file?

Something I've said about Windows systems and DFIR work is that as the versions of Windows have been developed, the amount of information that is automatically recorded as malware or an adversary interacts with the endpoint environment has increased significantly.  In many cases, this seems to be overlooked when it comes to developing threat intelligence for some reason; in spam and phishing campaigns, a lot of the different artifacts are examined...the contents of the email (headers, body, etc.), attachment macros, second-stage downloads, etc.  But what is often missed is document metadata embedded in the attachment; Word docs, Excel spreadsheets, and even LNK shortcut files can all be rich in valuable information.  One such example is looking at time stamps...when an email was sent, when a document was created, when a binary was compiled, etc., and lining all of those up to illustrate just how organized and planned out an attack appears to be.

Sunday, July 09, 2017

Tools

Today's post is a mish-mash of tools and techniques that I've seen or used recently...

Hindsight is a great free, open source tool for parsing a user's Chrome browser data.  I've used it a number of times to great effect; in one instance, I was able to show that a system became infected with ransomware when the user used Chrome to access their AOL email, where they downloaded and launched the malicious attachment.  The tool is very easy to use, and all you need to do is either point it at the user's "Default" folder (within the Chrome path), or extract the sqlite3 files and run it locally against the data.

Joe Gray over at AlienVault published an interesting article on data carving; this has always been an interesting DFIR topic, ranging from file carving to carving for individual records.  In the wake of the recent NotPetya attacks, Willi's EVTXtract might come in handy for some.  Another tool that I've run against decompressed hibernation files, pagefiles, and unallocated space is bulk_extractor, specifically when looking for indications of network communications.  My point is that if you're going to go carving, sometimes it's a good idea to first think about what it is you're carving for, and then seek an suitable approach to performing the carving.

Not "new" by any stretch, but Yogesh's research into Windows8/8.1 search history is still very relevant for a number of reasons.  For one, it illustrates the continued use of the LNK file format (which is actually pretty pervasive throughout the Windows platform...), telling us that not all of the stuff we learned from previous versions of Windows needs to be thrown out the door.  Second, Yogesh's finding that the retention mechanism for search terms changed between Windows 8 and 8.1 illustrates how quickly things can change on Windows systems.  I mean, look at what the Volatility folks have had to deal with!  ;-)

I ran across the Network Usage View tool from NIRSoft recently...that's a pretty interesting capability.  The write-up for the tool indicates that it gets it's data by reading the SRUDB.DAT database on Win8 and Win10 systems.  This is potentially a pretty valuable data source for DFIR work and analysis. In case you haven't seen it, Yogesh has a pretty fascinating presentation available on SRUM Forensics that is worth checking out.

I saw on Twitter recently that there's a Python-based tool available available now for diff'ing Registry hive files.  I completely agree with those who've commented that this is some great functionality to have available, and has a great deal of potential...but this functionality has been around from quite some time already, via other sources.  For example, James McFarlane's Parse::Win32Registry Perl module distribution includes a script that implements this functionality.  Another tool that allows you to diff two Registry hive files is RegShot.  I agree that this is great functionality to have available, particularly if you want to see what differences exist between a hive file extracted from a VSC or found in the RegBack folder, with one in the config folder.

Speaking of the Registry, I saw this paper from DFRWS 2008 that discusses recovering deleted data from Registry hive files.  My first real encounter with this sort of information was via Jolanta Thomassen's dissertation paper on the topic, and the regslack tool she provided to go along with it.  Since then, other tools (RegRipper plugins del.pl and del_tln.pl) have implemented similar functionality, largely due to the demonstrated value of this functionality.

Jason Hale posted a while back (2 yrs) on the DeviceContainers key on Windows systems, and I ran across his post again recently.  What he found is pretty interesting...I'll have to dig into it a bit more and see what else is available out there.  Jason's research seems to provide a pretty good idea of what can be derived from the key data, so this may be well worth developing a RegRipper plugin, even if just to research what's available in various hives.

I was working on some analysis recently, and was facing an issue where a good number of NTUSER.DAT files had been recovered from an image, all of which had been extracted from the image and placed in folder paths.  While there were a lot of these files, I was only interested in one Registry key (pertinent to the case), a key for which a RegRipper plugin did not exist.  So, I modified an existing plugin to give me information about the key in question, if it did exist, and then wrote a DOS batch file to iterate through all of the folders, running the new plugin (via rip.exe) against the hive file.  A few minutes of development and testing, and I had a repeatable, documented process in place and functioning, providing a capability that had not been in my hands just a few moments before.  My point in sharing this is to illustrate what can be achieved through simple problem definition, and the use of open sources to develop a solution.  I have the batch file I used, so it's pretty much self-documenting, and I pasted the command line from the batch file into my case notes.

pundup - Python script from herrcore to extract contents of McAfee *.bup files.  Even in 2017, there are a great deal of systems (and infrastructures) without any real endpoint monitoring capability employed, and sometimes you need to dig around a bit to get some really useful information about an incident.  One place you can look is AV detections (via the logs), and as such, any available quarantined files may provide even greater insight into the incident.  Further, if the system is running an older version of Windows, and you don't have an Amcache.hve file collecting process execution artifacts (like SHA-1 hashes), having the actual EXE itself to document, hash, and analyze would be very beneficial.

AppCompatProcessor - I ran across this little open source gem recently (note: according to the readme, this does not currently run on Windows); this tool runs through either AppCompatCache or AmCache data and allows you to...well, do a LOT with the data. It's well worth a look; just reading through the main page, I can easily see that a lot of what I include in my own workflow is used as pivot points, and then to expand the data.  For example, I tend to look for things like "$Recycle",

SysMon View - this is a really interesting approach to filtering and visualizing data collected by Sysmon on a Windows system.  Unfortunately, the only time I see Sysmon in use is on my own test systems; it does not seem to have been widely adopted by members of the corporate community who call for IR assistance.  I do think that this is a great approach to making better use of the data, though.

LimaCharlie - from refractionPOINT, described as an open source, cross-platform endpoint sensor.  There isn't a great deal of information available via the web page, but there are a few tweets available.

SideNote:
Speaking of endpoint agents, SANS recently conducted a product review of CrowdStrike's Falcon platform...you can get the PDF report here.
/SideNote

Invoke-Phant0m - The description states that the script "walks thread stacks of Event Log Service process (spesific[sic] svchost.exe) and identify Event Log Threads to kill Event Log Service Threads. So the system will not be able to collect logs and at the same time the Event Log Service will appear to be running."  Given the recent release of tools that claim to be able to remove individual Windows Event Log records, this is an interesting approach.  However, the biggest issue with the released tools is the inability to validate findings; while some on Twitter (I've been pointed to tweets) have claimed success, the actual EXE and process used haven't been shared to the point of allowing others to validate the findings.  To say, "...just start with this DLL..." does not provide a means of validation.

Of course, if you're not able to remove individual records, Inslainity provided another approach, albeit one that can be validated.  I've tested another approach to removing specific ranges of event records from the Security Event Log, using a method that can be scaled to all logs, but is much more insidious if you don't.

The folks at Javelin Networks have come up with an in-memory PowerShell script that can peek into consoles and provide detailed information about what's being done.  The description states that the script "extracted the content of the following command-line shells: PowerShell, CMD, Python, Wscript, MySQL Client, and some custom shells such as Mimikatz console. In some cases, the tools might be helpful to extract encrypted shells like the one used in PowerShell Empire Agent."

News
Adam (Hexacorn) has published yet another article demonstrating means of persistence and EDR bypass.  If nothing else, this is an excellent example of why endpoints of all versions (not just Windows) need to be instrumented to monitor and record process creation events, including full command lines.

Pete James over at Precision Discovery have a fascinating blog post in which he discusses records left behind in an often-overlooked Windows Event Log file.  I can't say that I've ever had a case where I've needed to know which Office files had been accessed by a user, but if you're tracking such artifact categories, then this is a good one to include.

Speaking of Office products, Will Knowles at MWRLabs published a blog post on using Office add-ins for persistence.

Here are a couple of links I've had sitting around for a while that I really haven't dug into...
Javelin Networks - CLI Powershell
NViso - Hunting Malware with Metadata
FireEye - Shim Databases used for Persistence

Thursday, June 15, 2017

Analyzing Documents

I've noticed over time that a lot of the write-ups that get posted online regarding malware or downloaders delivered via email attachments (i.e., spear phishing campaign) focus on what happens after the malicious payload is activated...the URL reached to, the malware downloaded, etc.  However, few seem to dig into the document itself, and there's a great deal can be gleaned from those documents, that can add to the threat intel picture.  If you're not looking at everything involved in the incident, if you're not (as Jesse Kornblum said) using all the parts of the buffalo, then you're very likely missing critical elements of the threat intel picture.

Here's an example from MS...much of the information in the post focuses on the embedded macro and the subsequent decrypted Cerber executable file, but there's nothing available regarding the document itself.

Keep in mind that different file formats (LNK, OLE, etc.) will contain different information.  And what I'm referring to here isn't about running through analysis steps that take a great deal of time; rather, what I'm going to show you are a few simple steps you can use to derive even more information from the attachment/wrapper documents themselves.

I took a look at a couple of documents (Doc 1, Doc 2) recently, and wanted to share my process and see if others might find it useful.  Both of these OLE-format documents have hashes available (or you can download and compute the hashes yourself), and they were also found on VirusTotal:

VirusTotal analysis for Doc 1
VirusTotal analysis for Doc 2

The VT analysis for both files includes a comment that the file was used to deliver PupyRAT.

Tools
Tools I'll be using for this analysis include my own oledmp.pl and wmd.pl.

Doc 1 Analysis
Running oledmp.pl against the file, we see:

Fig. 1: Doc 1 oledmp.pl output

















That's a lot of streams in this OLE file.  So, one of the first things we see is the dates for the Root Entry and the 'directories' (MS has referred to the OLE file format as "a file system within a file", and they're right), which is 1 Jan 2017.  According to VT, the first time this file was submitted was 1 Jan 2017, at approx. 20:29:43 UTC...so what that tells us is that it's likely that one of the first folks to receive the document submitted it less than 14 hrs after the file was modified.

Continuing with oledmp.pl, we can view the contents of the various streams in a hex dump format, but we see that stream number 20 contains a macro.  Using oledmp.pl with the argument "-d 20", we can view the contents of the stream in hex dump format.  In the output we see what appear to be 2 base64-encoded Powershell commands, one that downloads PupyRAT to the system, and another that appears to be shell code.  Copying and decoding both of the streams gives us the command that downloads PupyRAT, as well as a second command that appears to be some form of shell code.  Some of the variable names ($Qsc, $zw5) appear to be unique, so searching for those via Google leads us to this Hybrid-Analysis write-up, which provides some insight into what the shell code may do.

Interestingly enough, the same search reveals that, per this Reverse.IT link, both encoded Powershell commands were used in another document, as well.

Moving on, here's an excerpt of the output from wmd.pl, when run against this document:

--------------------
Summary Information
--------------------
Title        :
Subject      :
Authress     : Windows User
LastAuth     : Windows User
RevNum       : 2
AppName      : Microsoft Office Word
Created      : 01.01.2017, 06:51:00
Last Saved   : 01.01.2017, 06:51:00
Last Printed :

Notice the dates...they line up with the previously-identified dates (see fig.1)


Doc 2 Analysis
Following the same process we did with doc 1, we can see very similar output from oledmp.pl with doc 2:

Fig. 2: Doc 2 oledmp.pl output















One of the first things we can see is that this document was created within about 24 hrs of doc 1.

In the case of doc 2, stream 16 contains the data we're looking for...extracting and decoding the base64-encoded Powershell commands, we see that the commands themselves (PupyRAT download, shell code) are different.  Conducting a Google search for the variables used in the shell code command, we find this Hybrid-Analysis write-up, as well as this one from Reverse.IT.

Here's an excerpt of the output from wmd.pl, when run against this document:

--------------------
Summary Information
--------------------
Title        : HealthSecure User Registration Form
Subject      : HealthSecure User Registration Form
Authress     : ArcherR
LastAuth     : Windows User
RevNum       : 2
AppName      : Microsoft Office Word
Created      : 02.01.2017, 06:49:00
Last Saved   : 02.01.2017, 06:49:00
Last Printed : 20.06.2013, 06:27:00

--------------------
Document Summary Information
--------------------
Organization : ACC

Remember, this is a sample pulled down from VirusTotal, so there's no telling what happened with the document between the time it was created and submitted to VT.  I made the 'authress' information bold, in order to highlight it.

Summary
While this analysis may not appear to be of significant value, it does form the basis for developing a better intelligence picture, as it goes beyond the more obvious aspects of what constitutes most analysis (i.e., the command to download PupyRAT, as well as the analysis of the PupyRAT malware itself) in phishing cases.  Valuable information can be derived from the document format used to deliver the malware itself, regardless of whether it's an MSOffice document, or a Windows shortcut/LNK file.  When developing an intel picture, we need to be sure to use all the parts of the buffalo.

Wednesday, May 31, 2017

Use of LNK Files...Again

I've discussed LNK file before...here, and more recently, here.  Windows shortcut files are an artifact on Windows systems that have been available for so long that there are likely a great many analysts who understand the category this artifact falls into, but may not know a great deal about the nature of the artifact itself.  That is to say that it's likely that a good many training and certification courses tend to gloss over some of the more esoteric aspects of the Windows shortcut file as an artifact.

Most analysts likely understand the Windows shortcut, or "LNK" file to be an artifact of "user knowledge of files", as the most commonly understood activity that leads to the automatic creation of these files is a user double-clicking on a file via the Windows Explorer shell.

We also know that understanding the format of these files can be very important, as well.  The LNK file format is used as the basis for individual streams within JumpLists, and there are several Registry keys that contain value data that follows the LNK file format.  In fact, LNK files are compose, at least in part, of shell items...so now we're getting into the format of the individual building blocks of many artifacts on Windows systems.

Let's take a step back for a moment, and consider the most common means of "producing" Windows shortcut files.  From a standpoint of developing and interpreting evidence, let's say that a user plugs a USB device into a laptop, opens the contents of the device in Windows Explorer, and then double-clicks an MS Word document.  At that point, a shortcut file is created in the user's Recent folder that contains not only a series of shell items that comprise the path to the file, but also a string that refers to the file, as well.  However, the LNK file also contains information about that system on which it was created, and it's this aspect of the file format that is missed or forgotten, due the fact that in most cases, this embedded information isn't very interesting.  However, if an LNK file is not created as an artifact of, say, a malware infection, but is instead used by an attacker to infect other systems, then the LNK file will contain information about the system on which it was developed, which would NOT be the victim system.  This is when the contents of the LNK file become very interesting.

TrendMicro's TrendLabs recently published a blog article that gives a very good example of when these files can be extremely interesting.  The article describes the reported increase in use of LNK files as weaponized attachments by the group identifed as "APT 10".  I find this absolutely fascinating, not because something that has been part of Windows from the very beginning has been turned against users and employed as an attack vector, but because of the information and metadata that seems to be left on the floor because these files simply are not parsed or dug into...at least not to any extent that is discussed publicly.  It seems that most organizations may miss or discount the value of Windows shortcut file contents, and not incorporate them into their overall intelligence picture.

This JPCERT blog post describes "Evidence of Attackers' Development Environment Left in Shortcut Files", and this NViso blog post discusses, "Tracking Threat Actors Through .LNK Files".  These are great at describing what can be done, providing illustration and example.

Sunday, April 09, 2017

Getting Started

Not long ago, I gave some presentations at a local high school on cybersecurity, and one of the questions that was asked was, "how do I get started in cybersecurity?"  Given that my alma mater will establish a minor in cybersecurity this coming fall, I thought that it might be interesting to put some thoughts down, in hopes of generating a discussion on the topic.

So, some are likely going to say that in today's day and age, you can simply Google the answer to the question, because this topic has been discussed many times previously.  That's true, but it's as blessing as much as it is a curse; there are many instances in which multiple opinions are shared, and at the end of the thread, there's no real answer to the question.  As such, I'm going to share my thoughts and experience here, in hopes that it will start a discussion that others can refer to.  I'm hoping to provide some insight to anyone looking to "get in" to cybersecurity, whether you're an upcoming high school or college graduate, or someone looking to make a career transition.

During my career, I've had the opportunity to be a "gate keeper", if you will.  As an incident responder, I was asked to vet resumes that had been submitted in hopes of filling a position on our team.  To some degree, it was my job to receive and filter the resumes, passing what I saw as the most qualified candidates on to the next phase.  I've also worked with a pretty good number of analysts and consultants over the years.

The world of cybersecurity is pretty big and there are a lot of roads you can follow; there's pen testing, malware reverse engineering, DFIR, policy, etc.  There are both proactive and reactive types of work.  The key is to pick a place to start.  This doesn't mean that you can't do more than one...it simply means that you need to decide where you want to start...and then start. Pick some place, and go from there.  You may find that you're absolutely fascinated by what you're learning, or you may decide that where you started simply is not for you.  Okay, no problem.  Pick a new place and start over.

When it comes to reviewing resumes, I tend to not focus on certifications, nor the actual degree that someone has.  Don't get me wrong, there are a lot of great certifications out there.  The issue I have with certifications is that when most folks return from the course(s) to obtain the certification, there's nothing that holds them accountable for using what they learned.  I've seen analysts go off to a 5 or 6 day training course in DFIR of Windows systems, which cost $5K - $6K (just for the course), and not know how to determine time stomping via the MFT (they compared the file system last modification time to the compile time in the PE header).

I am, however, interested to see that someone does have a degree.  This is due to the fact that having a degree pretty much guarantees a minimum level of education, and it also gives insight into your ability to complete tasks.  A four (or even two) year degree is not going to be a party everyday, and you're likely going to end up having to do things you don't enjoy.

And why is this important?  Well, the (apparently) hidden secret of cybersecurity is that at some point, you're going to have to write.  That's right. No matter what level of proficiency you develop at something, it's pretty useless if you can't communicate and share it with others.  I'm not just talking about sharing your findings with your team mates and co-workers (hint, "LOL" doesn't count as "communication"), I'm also talking about sharing your work with clients.

Now, I have a good bit of experience with writing throughout my career.  I wrote in the military (performance reviews, reports, course materials, etc.), as part of my graduate education (to include my thesis), and I've been writing almost continually since I started in infosec.  So...you have to be able to write.  A great way to get experience writing is to...well...write.  Start a blog.  Write something up, and share it with someone you trust to actually read it with a critical eye, not just hand it back to you with a "looks good".  Accept that what you write is not going to be perfect, every time, and use that as a learning experience.

Writing helps me organize my thoughts...if I were to just start talking after I completed my analysis, what came out of my mouth would not be nearly as structured, nor as useful, as what I could produce in writing.  And writing does not have to be sole source of communications; I very often find it extremely valuable to write something down first, and then use that as a reference for a conversation, or better yet, a conference presentation.

So, my recommendations for getting started in the cybersecurity field are pretty simple:
1. Pick some place to start.  If you have to, reach to someone for advice/help.
2. Start. If you have to, reach to someone for advice/help.
3. Write about what you're doing. If you have to, reach to someone for advice/help.

There are plenty of free resources available that provide access to what you need to get started; online blog posts, pod casts/videos, presentations, books (yes, books online and in the library), etc.  There are free images available for download, as part of DFIR challenges (if that's what you're interested in doing).  There are places you can go to find out about malware, download samples, or even run samples in virtual environments and emulators.  In fact, if you're viewing this blog post online, then you very likely have everything you need to get started.  If you're interested in DFIR analysis or malware RE, you do not need to have access to big expensive commercial tools to conduct analysis...that's just an excuse for paralysis.

There is a significant reticence to sharing in this "community", and it's not simply isolated to folks who are new to the field.  There are a lot of folks who have worked in this industry for quite a while who will not share experiences or findings.  And there is no requirement to share something entirely new, that no one's seen before.  In fact, there's a good bit of value in sharing something that may have been discussed previously; it shows that you understand it (or are trying to), and it can offer visibility and insight to others ("oh, that thing that was happening five years ago is coming back...like bell bottoms...").

The take-away from all of this is that when you're ready to put your resume out there and apply for a position in cybersecurity, you're going to have some experience in the work, have visible experience writing that your potential employer can validate, and you're going to know people in the field.

Understanding What The Data Is Telling You

Not long ago, I was doing some analysis of a Windows 2012 system and ran across an interesting entry in the AppCompatCache data:

SYSVOL\Users\Admin\AppData\Roaming\badfile.exe  Sat Jun  1 11:34:21 2013 Z

Now, we all know that the time stamp associated with entries in the AppCompatCache is the file system last modification time, derived from the $STANDARD_INFORMATION attribute.  So, at this point, all I know about this file is that it existed on the system at some point, and given that it's now 2017 it is more than just a bit odd, albeit not impossible, that that is the correct file system modification date.

Next stop, the MFT...I parsed it and found the following:

71516      FILE Seq: 55847 Links: 1
[FILE],[BASE RECORD]
.\Users\Admin\AppData\Roaming\badfile.exe
    M: Sat Jun  1 11:34:21 2013 Z
    A: Mon Jan 13 20:12:31 2014 Z
    C: Thu Mar 30 11:40:09 2017 Z
    B: Mon Jan 13 20:12:31 2014 Z
  FN: msiexec.exe  Parent Ref: 860/48177
  Namespace: 3
    M: Thu Mar 30 11:40:09 2017 Z
    A: Thu Mar 30 11:40:09 2017 Z
    C: Thu Mar 30 11:40:09 2017 Z
    B: Thu Mar 30 11:40:09 2017 Z
[$DATA Attribute]
File Size = 1337856 bytes

So, this is what "time stomping" of a file looks like, and this also helps validate that the AppCompatCache time stamp is the file system last modification time, extracted from one of the MFT record attributes.  At this point, there's nothing to specifically indicate when the file was executed but now, we have a much better idea of when the file appeared on the system. The bad guy most likely used the GetFileTime() and SetFileTime() API calls to perform the time stomping, which we can see by going to the timeline:

Mon Jan 13 20:12:31 2014 Z
  FILE                       - .A.B [152] C:\Users\Admin\AppData\Roaming\
  FILE                       - .A.B [56] C:\Windows\explorer.exe\$TXF_DATA
  FILE                       - .A.B [1337856] C:\Users\Admin\AppData\Roaming\badfile.exe
  FILE                       - .A.B [2391280] C:\Windows\explorer.exe\

Fortunately, the system I was examining was Windows 2012, and as such, had a well-populated AmCache.hve file, from which I extracted the following:

File Reference: da2700001175c
LastWrite     : Thu Mar 30 11:40:09 2017 Z
Path          : C:\Users\Admin\AppData\Roaming\badfile.exe
Company Name  : Microsoft Corporation
Product Name  : Windows Installer - Unicode
File Descr    : Windows® installer
Lang Code     : 1033
SHA-1         : 0000b4c5e18f57b87f93ba601e3309ec01e60ccebee5f
Last Mod Time : Sat Jun  1 11:34:21 2013 Z
Last Mod Time2: Sat Jun  1 11:34:21 2013 Z
Create Time   : Mon Jan 13 20:12:31 2014 Z
Compile Time  : Thu Mar 30 09:28:13 2017 Z

From my timeline, as well as from previous experience, the LastWrite time for the key in the AmCache.hve corresponds to the first time that badfile.exe was executed on the system.

What's interesting is that the Compile Time value from the AmCache data is, in fact, the compile time extracted from the header of the PE file.  Yes, this value is easily modified, as it is simply a bunch of bytes in the file that do not affect the execution of the file itself, but it is telling in this case.

So, on the surface, while it may first appear as if the badfile.exe had been on the system for four years, it turns out that by digging a bit deeper into the data, we can see that wasn't the case at all.

The take-aways from this are:
1.  Do not rely on a single data point (AppCompatCache) to support your findings.

2.  Do not rely on the misinterpretation of a single data point as the foundation of your findings.  Doing so is more akin to forcing the data to fit your theory of what happened.

3. The key to analysis is to know the platform you're analyzing, know your data...no only what is available, but it's context.

4.  During analysis, always look to artifact clusters.  There will be times when you do not have access to all of the artifacts in the cluster, so you'll want to validate the reliability and fidelity of the artifacts that you do have.

Saturday, April 08, 2017

Understanding File and Data Formats

When I started down my path of studying techniques and methods for computer forensic analysis, I'll admit that I didn't start out using a hex editor...that was a bit daunting and more than a little overwhelming at the time.  Sure, I'd heard and read about those folks who did, and could, conduct a modicum of analysis using a hex editor, but at that point, I wasn't seeing "blondes, brunettes, and redheads...".  Over time and with a LOT of practice, however, I found that I could pick out certain data types within hex data.  For example, within a hex dump of data, over the years my eyes have started picking out repeating patterns of data, as well as specific data types, such as FILETIME objects.

Something that's come out of that is the understanding that knowing the structure or format of specific data types can provide valuable clues and even significant artifacts.  For example, understanding the structure of Event Log records (binary format used for Windows NT, 2000, XP, and 2003 Event Logs) has led to the ability to parse for records on a binary level and completely bypass limitations imposed by using the API.  The first time I did this, I found several valid records in a *.evt file that the API "said" shouldn't have been there.  From there, I have been able to carve unstructured blobs of data for such records.

Back when I was part of the IBM ISS ERS Team, an understanding of the structure of Windows Registry hive files led us to being able to determine the difference between credit card numbers being stored "in" Registry keys and values, and being found in hive file slack space.  The distinction was (and still is) extremely important.

Developing an understanding of data structures and file formats has led to findings such as Willi Ballenthin's EVTXtract, as well as the ability to parse Registry hive files for deleted keys and values, both of which have proven to be extremely valuable during a wide variety of investigations.

Other excellent examples of this include parsing OLE file formats from Decalage, James Habben's parsing Prefetch files, and Mari's parsing of data deleted from SQLite databases.

Other examples of what understanding data structures has led to includes parsing Windows shortcuts/LNK files that were sent to victims of phishing campaigns.  This NViso blog post discusses tracking threat actors through the .lnk file they sent their victims, and this JPCert blog post from 2016 discusses finding indications of an adversary's development environment through the same resource.

Now, I'm not suggesting that every analyst needs to be intimately familiar with file formats, and be able to parse them by hand using just a hex editor.  However, I am suggesting that analysts should at least become aware of what is available in various formats (or ask someone), and understand that many of the formats can provide a great deal of data that will assist you in your investigation.