The University of Massachusetts Amherst
Categories
Operating System

What is Data Forensics?

Short History of Data Forensics

The concept of data forensics was created in the 1970s with the first acknowledged data crime seen in Florida, 1978, where deleting files to hide evidence became considered illegal. The field gained traction through the 20th century with the FBI creating the Computer Analysis and Response Team quickly followed by the creation of the British Fraud Squad. The small initial size of these organizations created a unique situation where civilians were brought in to assist with investigations. In fact, it’s acceptable to say that computer hobbyists in the 1980s and 1990s gave the profession traction, as they assisted government agencies in developing software tools for investigating data related crime. The first conference on digital evidence took place in 1993 at the FBI Academy in Virginia; it was a huge success, with over 25 countries attending, it concluded in the agreement that digital evidence was legitimate and that laws regarding investigative procedure should be drafted. Until this point, no federal laws had been put in place regarding data forensics, somewhat detracting from its legitimacy. The last section of history takes place in the 2000s, which marks the field’s explosion in size. The advances seen in home computing during this time allowed for the internet to start playing a larger part in illegal behavior, as well as more powerful software both to aid and counteract illegal activity. At this point, government agencies were still aided greatly by grassroots computer hobbyists who continued to help design software for the field.

Why is it so Important?

The first personal computers, while incredible for their time, were not capable of many operations, especially when compared to today’s machines. These limitations were bittersweet, as they limited the illegal behavior available. With hardware and software continuing to develop at a literally exponential rate, coupled with the invention of the internet, it wasn’t long before crimes increased with parallel severity. For example, prior to the internet, someone could be caught in possession of child pornography (a fairly common crime associated with data forensics) and that would be the end of it; they would be prosecuted and their data confiscated. Post-internet, someone could be in possession of the same materials, however they could now be guilty of distribution across the web, greatly increasing the severity of the crime, as well as how many others might be involved. 9/11 sparked a realization for the necessity for further development in data investigation. Though no computer hacking or software manipulation aided in the physical act of terror, it was discovered later on that there was traces of data leading around the globe that pieced together a plan for the attack. Had forensics investigations been more advanced than they were at the time, a plan might have been discovered and the entire disaster avoided. A more common use for data forensics is to discover fraud in companies, and contradictions in their server system’s files. Investigations as such tend to take a year or longer to complete given the sheer amount of data that has to be looked through. Bernie Madoff, for example, used computer algorithms to change the origin of the money being deposited into his investors’ accounts so that his own accounts did not drop at all. In this case, more than 36 billion dollars were stolen from clients. That magnitude is not uncommon for fraud of such a degree. Additionally, if a company declares bankruptcy, it can often follow that they must submit data for analysis to make sure no one is benefiting from the company’s collapse.

How Does Data Forensics Work?

The base procedure for collecting evidence is not complicated. Judd Robbins, a renowned computer forensics expert, describes the sequence of events as following:

The computer is first collected, and all visible data – meaning data that does not require any algorithms or special software to recover – copied exactly to another file system or computer. It’s important that the actual forensics process not take place on the accused’s computer in order to insure no contamination in the original data.

Hidden data is then searched for, including deleted files or files that have been purposefully hidden from plain view and sometimes requiring extensive effort to recover.

Beyond simply making invisible to the system or deleting files, data can also be hidden in places on the hard drive that it would not logically be. A file could possibly be disguised as a registry file in the operating system to avoid suspicion. This kind of sorting the unorthodox parts of the hard drive can be incredibly time consuming.

While all of this is happening a detailed report must be updated that keeps track of not only the contents of the files, but if any of them were encrypted or disguised. In the world of data forensics, merely hiding certain files can lead to an accusation of probable cause.

Tools

Knowing the workflow of investigations is useful for a basic understanding, but the types of tools that have been created to assist investigators are the core of discovering data, leaving the investigators to interpret the results. While details of these tools is often kept under wraps to prevent anti-forensics tools from being developed, their basic workings are public knowledge.

Data Recovery tools are algorithms which detect residual charges on the sectors of a disk to essentially guess what might have been there before (this is how data recovery works too). Reconstruction tools do not have a 100% success rate, as some data could be simply too spread out to recover. Deleted data can be compared to an unsolved puzzle with multiple solutions, or perhaps a half burnt piece of paper. It’s possible to only recover some of the data too, and therefore chance comes into play again as to whether that data will be useful or not.

We’ve mentioned previously the process of copying the disk in order to protect the original copy. A Software or Hardware Write tool is in charge of copying the disk, while insuring that none of the metadata is altered in the process. The point of this software is to be untraceable so that an investigator does not leave a signature on the disk. You could think of accidentally updating the metadata as putting your digital fingerprints on the crime scene.

Hashing tools are used to compare one disk to another. If an investigator were to compare two different servers together with thousands of gigabytes of data, it would take years and years to go through to look for something that may not even exist. Hashing is a type of algorithm that simply runs through one disk piece by piece and tries to identify a similar or identical file on a different one. The nature of hashing makes it excellent for fraud investigations as it allows the analyst to check for anomalies that would indicate tampering.

Though many other tools exist, and many are developed as open source for operating systems such as Linux, these are the fundamental types of tools used. As computers continue to advance, more tools will inherently be invented to keep up with them.

Difficulties During Investigations

The outline of the process makes the job seem somewhat simple, if not a little tedious. What excites experts in the field is the challenge of defeating the culprit’s countermeasures that they may have put in place. These countermeasures are referred to a ‘Anti-Forensics’ tools and can range as far in complexity as the creator’s knowledge of software and computer operations. For example, every time a file is opened the ‘metadata’ is changed – metadata refers to the information about the file, not what’s inside it, regarding things such as last time opened, date created and size – which can be an investigator’s friend or foe. Forensic experts are incredibly cautious to not contaminate metadata while searching through files, as doing so can compromise the integrity of the investigation; it could be crucial to know the last time a program was used or a file opened. Culprits with sufficient experience can edit metadata to throw off investigators. Additionally, files can be masked as different kinds of files as to also confuse investigators. For example, a text file containing a list of illegal transactions could saved as a .jpeg file and the metadata edited so that the investigator would either pass over it, thinking a picture irrelevant, or perhaps open the picture to find nothing more than a blank page or even an actual picture of something. They would only find the real contents of the file if they thought to open it with a word processor as it was originally intended.

Another reason data is carefully copied off the original host is to avoid any risk of triggering a programmed ‘tripwire’ so to speak. Trying to open a specific file could perhaps also activate a program to scramble the hard drive to avoid any other evidence being found. While deleted data can be recovered, a process called ‘scrambling’ cannot. Scrambling the disk rewrites random bits to the entire drive. Overwriting data is impossible to undo in this case, and can therefore protect incriminating evidence. That being said if such a process occurs it offers compelling reason to continue the investigation if someone has gone to such an extent to keep data out of the hands of the police.

Additionally, remote access via the internet can be used to alter data on a local computer. For this reason, it is common practice for those investigating to sever any external connections the computer may have.

Further, data forensics experts forced to be meticulous, as small errors can result in corrupted data that can no longer be used as evidence. More than just fighting the defendant’s attempt to hide their data, analysts fight with the law to keep their evidence relevant and legal. Accidentally violating someone’s rights to data security can result in evidence being thrown out. Just with any legal search a warrant is needed and not having one will void any evidence found. Beyond national legal barriers, the nature of the internet allows users to freely send files between countries with ease. If information is stored in another country, it requires international cooperation to continue the investigation. While many countries inside NATO and the UN are working on legislation that would make international data investigations easier, storing data around the globe remains a common tool of hackers and other computer criminals to maintain anonymity.

Looking Forward

Data security is a serious concern in our world, and will grow in importance given our everyday reliance on digital storage and communication. As computer technology continues to advance at the pace it is, both forensics and anti-forensics tools will continue to advance as more advanced and literate software is developed. With AI research being done at research universities across the world, it is quite possible the future forensics tools will be adaptive, and learn to find patterns by themselves. We already have learning security tools such as Norton or McAfee virus protection for home computers which remember which programs you tell it are safe and make educated guesses in future based on your preferences. This is only scratching the surface of what is capable from such software, leaving much to be discovered in the future. With the advancement in software comes the negative too, with more powerful resources for cyber criminals to carry out their operations undetected. Data Forensics, and information security as a whole, then, can be seen as a never ending race to stay in front of computer criminals. As a result, the industry continues to flourish, as new analysts are always needed with software advances taking place every day.