Blog

Applying the Precision Testing Methodology to the Master File Table

In the previous post, I demonstrated a method for determining unique records from non-resident INDX streams. The purpose for this method is to fill a gap in log2timeline’s functionality while we wait for it to be updated (not a criticism). I’m not sure if this method is super practical, but at worst it has been an interesting exercise.

Previously, I mentioned the method could possibly be applied to the MFT since it is of a fixed size. This week I tried the method for MFTs on the Defcon Desktop image because it has multiple partitions. As part of the timelining process I export the MFT using image_export.py and use analyzeMFT to create a body file with fully resolved file paths. An issue I run into is that with multiple NTFS partitions there are multiple MFTs, and image_export doesn’t seem to be able to handle multiple partitions since the files are named the same.

 Image_export.py and multiple MFTs

Image_export.py and multiple MFTs

I could export using FTK Imager or another tool, but in the spirit of automation, I’d like a command line solution. Bulk_extractor-rec has a plugin for the mft (ntfsmft). Super. I follow the method in the previous post with split -b 1024 instead of 4096. I get to the end, and there are no duplicate hashes between the output of bulk_extractor-rec and the exported MFT, meaning I can’t replace Image_export.py with bulk_extractor-rec. Bummer. This method may not be so great for the MFT. Hmm. I decide to try the XP image from Investigating Windows Systems since it’s small. The method works. No uniques, meaning I could use bulk_extractor-rec as a replacement for Image_export.py. Okay, I’ll continue to investigate the Defcon Desktop image.

I decided to look into each MFT after being parsed with a tool. Perhaps I can find out a little more about what’s going on since I don’t want to spend time looking at the hex data yet. Enter Eric Zimmerman’s MFTECmd tool. It parses the “real” MFT just as expected, but it ran into issues with the bulk_extractor-rec MFT. "There was an error loading the file! Error: An item with the same key has already been added." I email Eric, I describe what’s going on, and he looks into it.

ZimmermanEmail.png

Eric finds the issue and updates his tool to show warnings for me. Amazing. Thank you, Eric!

 MFTECmd v0.3.3 now shows a warning for records with a duplicate key.

MFTECmd v0.3.3 now shows a warning for records with a duplicate key.

During the email exchange, I tried to remove duplicates on my own to prevent the error.

#Split the MFT into individual records
split -b 1024 Bulk_MFT
#Hash each record. Print hash and name of file into a csv
rhash —md5 -p ‘%h,%p\n’ r ./ > ../Bulk_MFT.csv
#Unique the first column (hashes)
sort -u -t, -k1,1 Bulk_MFT.csv > Bulk_MFT.csv.unique
#Print the name of the file of each unique record
awk -F ‘,’ ‘{print $2}’ Bulk_MFT.csv.unique > Bulk_MFT.csv.unique.filename
#Concatenate each file in the unique filename list
for i in$(grep -v ‘^#’ Bulk_MFT.csv.unique.filename); do cat $i >> Bulk_MFT_dedup; done

Still get errors. Sent the dedup’d file to Eric. He said I had a bunch of extra free records (I don’t actually know what that means). Shoot. By this time he’s uploaded the new version of MTECmd, and I can process it normally. As I write this, I realize I still haven’t reviewed those CSVs. I’ll try to do that in Timeliner or Visual Studio Code with the Excel Viewer extension since the files are likely to have more than Excel’s limit of 1,048,576 rows.

During this time I’m also processing each MFT with psteal.py to create a quick timeline.
psteal.py —source MFT -o l2tcsv -w MFTtimeline.csv

I count the lines using wc -l for each csv. 1,943,572 for bulk_extractor-rec and 1,967,187 for the real MFT. Now, I’m racking my brain. I got fewer records with bulk_extractor-rec.

 I would have expected something like this Venn Diagram.

I would have expected something like this Venn Diagram.

I run psteal.py against a directory with both MFTs to see if psort.py figures out any duplicates. 3,910,753 rows - six fewer than adding each individual timeline. I expect one fewer since we now have only one header row, and I also expect two fewer since I’m only dealing with one file and one directory. I’m not super curious at this point to check the difference of three rows. I’ve got bigger problems to overcome.

I don’t have a conclusion yet, but if we determine there is value in the blue portion of the Venn Diagram, and we want to resolve full paths with analyzeMFT, then we can run similar commands as written earlier in this post or we can run analyzeMFT against both files, which will be faster than concatenating thousands of MFT records. My question is then “Is psort faster with potential duplicate records or is it faster to get unique records and concatenate them?”

For kicks and giggles here is the set of commands. I actually haven’t run these, but itshould find records from the blue area and append it to the yellow. This only matters if:
1. Log2timeline doesn’t have the ability to resolve full paths
2. Image_export can’t handle more than one MFT file
3. Bulk_extractor-rec adds value to the timeline
4. This is faster than running analyzeMFT on two files
5. The yellow is not fully contained within the blue.

#Split the MFT into individual records
split -b 1024 Bulk_MFT
split -b 1024 Real_MFT

#Hash each record. Print hash and name of file into a csv. The em dash is supposed to read as two hypens.
rhash —md5 -p ‘%h,%p\n’ r ./ > ../Bulk_MFT.csv
rhash —md5 -p ‘%h,%p\n’ r ./ > ../Real_MFT.csv

#Unique the first column (hashes)
sort -u -t, -k1,1 Bulk_MFT.csv > Bulk_MFT.csv.unique
sort -u -t, -k1,1 Bulk_MFT.csv > Real_MFT.csv.unique
#Print the hash of the file of each unique record
awk -F ‘,’ ‘{print $1}’ Bulk_MFT.csv.unique > Bulk_MFT.csv.unique.hashes
awk -F ‘,’ ‘{print $1}’ Real_MFT.csv.unique > Real_MFT.csv.unique.hashes
#Find hashes of Bulk_MFT.csv.unique.hashes not found in Real_MFT.csv.unique.hashes
comm -1 Bulk_MFT.csv.unique.hashes Real_MFT.csv.unique.hashes > Bulk_MFT.csv.unique.hashes.only
#Grep filename list with list from previous command
grep Bulk_MFT.csv.unique.hashes.only Bulk_MFT.csv.unique > Bulk_additive.csv
#Print the name of the file of each unique record
awk -F ‘,’ ‘{print $2}’ Bulk_additive.csv > Bulk_additive.csv.unique.filename
#Concatenate each file in the unique filename list. Make a backup of the MFT before doing this step.
for i in$(grep -v ‘^#’ Bulk_additive.csv.unique.filename); do cat $i >> MFT; done
#Now you’re ready for analyzeMFT

My opinions are my own and may not represent those of my employer.