BACKGROUND AND RELATED WORK

2. BACKGROUND AND RELATED WORK

The digital fields of malware and forensics are increasingly being combined to describe what Malin et al. (2008) term malware forensics. As a field of study in its own right, universities are now beginning to offer malware forensics either as a whole course/- module (University of Portsmouth, 2019) or as part of related modules, such as Digital Forensics (University of London, 2020).

The original motivation for this work arose from the realisation that digital forensic practitioners were conducting malware forensic investigations in a largely anecdotal manner. This may be in part due to the fact that there is little published material establishing a scientific basis for procedures applied to conducting a malware forensic investigation, and more specifically, for evaluating the tools to do so. Liu et al. (2017) applied malware ontology techniques to assist investigators by providing a means to categorise malware behaviour in terms of one of five broad categories. The definitions applied, lack a rationale and do not address malware that occupies more than one category. Furthermore, while helpful to a lay audience, the approach does not assist investigators in understanding the impact (if any) of any malware found on a computer under investigation.

The use of malware forensics is cited by Kim et al. (2014), who presented a model to investigate fraud using “malware forensic” techniques. Provataki and Katos (2013) offered a framework that extends the functionality of the Cuckoo sandbox (Cuckoo Foundation, 2016) to understand malware’s behaviour but not to evaluate the tools used to study such behaviour. Shosha et al. (2013) explored the limitations of dynamic malware analysis techniques for digital investigations, which are highlighted and propose a methodology to analyse malicious code running in forensically acquired computer memory. However, the methodology proposed is only applicable to the analysis of code running in memory and is not based on any formal requirements analysis.

A malware analysis approach was proposed by Ianelli et al. (2007), who suggested that the presence of malware can be addressed by examination of the network traffic logs. However, this suggestion assumes that such logs are more likely to be found in a corporate than domestic environment. Hence, a suspect accused of committing an offence via their home router will typically have far fewer logs and detail to assist their defence than in a commercial environment, where there would likely be more sophisticated logging available.

Malin et al. (2008) presented one of the few books on malware forensics, more recently split into separate Windows (2012) and Linux (2013) editions. Carvey (2012) also provided some coverage of the topic across two chapters from an investigative perspective, as part of a more general digital forensics discussion. Each of these texts presents a collection of tools and techniques to address various aspects of analysis, but none attempt to develop and evaluate a general-purpose framework for malware analysis or a rigorous, scientific means to evaluate the tools used.

The lack of a formalised approach means it is also not uncommon to find tools not specifically designed for forensic use being deployed. Hughes and Varol (2020) argued that the use of malware scanners, employed to identify malware in a forensic investigation, will not meet all possible functional requirements. For example, such tools are not designed to detect malware that previously existed on a machine and is now located in areas such as slack space, unused partitions, and deleted files. Thus, the validity of tools (and hence any resulting conclusions) can be undermined by their application to scenarios for which they were not designed.

Perhaps more significantly, the lack of a formalised approach means that court proceedings involving malware may not be adequately investigated. Such cases will inevitably become a candidate for miscarriages of justice, as the court would be forming a judgment without being fully informed of the facts. Some of the challenges that can impair an investigation involving malware are explored below.

2.1 The Trojan Defence

Separating user actions from those of malicious software is the fundamental objective when investigating the Trojan defence, where a defendant can claim the illegal activity recorded on a device is the result of malware and not their own actions. Brown (2015) highlighted the Trojan defence as one of several tactics used by counsel to raise doubt as to the authenticity of the electronic evidence presented to the court. Bowles and Hernandez-Castro (2015) highlighted “clear and obvious mistakes” with regard to Trojan defence cases in a study covering a 10 year period.

The problem of attribution is anticipated to become more challenging in the near future with the nefarious use of artificial intelligence (AI) to enhance malware. Thanh and Zelinka (2019) warn of an upcoming ‘AIpowered malware era,’ citing proof of concept work that demonstrates that ‘computational intelligence could be used to enhance malware. This warning is echoed by Truong et al. (2020) who identify deep learning techniques being applied to malware.

Bikeev et al. (2019) explored the challenges of applying mens rea to malicious AI and Bahnsen et al. (2018) developed an algorithm to enhance AI to be more effective during malicious phishing attacks. Alongside malware and AI, doubt in the reliability of digital evidence can also originate from the methodologies followed by forensic practitioners. Perhaps the most significant of these is an over dependence on anecdotal experience when reaching conclusions.

2.2 Repeated Confirmation

Sceptical digital forensic practitioners may defer to their anecdotal experience to argue that they are “yet to see an example” (McLinden, 2009) and similarly that they “haven’t seen a single case” (Douglas, 2007) of malware attributed to the downloading of indecent images of children. Similarly, the results from mainstream digital forensic tools have been accepted “based solely on the reputation of the vendor” (Garfinkel, Farrell, Roussev, & Dinolt, 2009). Such arguments are formulated on inductive reasoning, derived from repeated confirmation. Although useful to develop hypothesises, inductive reasoning cannot be used to test scientific theory (Levitin, 2016). There are also challenges in the processing and reasoning that are applied to expert evidence.

2.3 Ubiquitous Problems With Expert Evidence

Challenges with the evidence include experts who step outside of their own expertise. The now infamous trials of R v Clark [2003] EWCA Crim 1020, R v Cannings [2004] EWCA Crim 1 and R v Patel [2003] provide examples of where the defence expert, Professor Sir Roy Meadows, made a number of claims that had “no statistical basis” (Royal Statistical Society, 2001). Following these events, the Law Commission’s review (2011) of expert evidence in criminal trials called for a move to incorporate a greater level of scientific principles and provenance in expert evidence.

Challenges also arise with failures to find and/or disclose evidence correctly. Bowcott (2018) cites problems at a series of criminal trials where digital evidence was either not found nor passed to the defence team during disclosure.

Problems with expert evidence are not limited to the UK alone. Edmond and Vulle (2014) examined the use of forensic science evidence in trials and concluded three separate criminal justice systems (United States, Switzerland and Australia) each failed to identify “deep structural and endemic problems with many types of forensic science.”

Edmond and Vulle (2014) go on to argue that these problems extend to the use of language by experts, stating that the “expressions used by analysts are not empirically based.” This is echoed by Adam (2016), who argues that language used for conclusions such as ‘it is likely is based on posterior probabilities and so implies probabilistic support to the conclusion. However, such phrases almost never provide any detail on how the likelihood has been reached. Such a conclusion could be based on unreported properties of items considered or entirely subjective. Similarly, Adam (2016) goes on to challenge the phrase ‘is consistent with,’ which states some (unknown) degree of similarity between two things. Typically, either no alternative sources are given or a sense of how common the ‘consistent’ features are in the wider population. The misuse of language in this way may be linked to a lack of understanding of the underlying scientific principles by practitioners.

2.4 Lack Of Scientific Principles

Casey (2019) argues that digital forensics is distinct from forensic science “despite over a decade of effort to break down the borders between them.” He goes on to argue that some practitioners accept results as “factual,” failing to recognise the need for scientific treatment. This leads to problems in recognising and reporting error rates, quantifying levels of confidence in findings, or reporting on alternative interpretations of findings. Christensen et al. (2014) argue that practitioners appear to have either misunderstood the term ‘error’ or lacked the skills to apply statistics or the scientific method correctly. They add, practitioners have reportedly claimed either that there is a zero error rate, that such an error rate cannot be estimated or that practitioners have attempted to “calculate error rates post facto.”

The challenges faced by the digital forensics field are exacerbated within the relatively young malware forensics field by issues such as malware routinely obfuscating its true intentions and hindering attempts to analyse it (Wagener, Dulaunoy, Engel, 2008). There is, therefore, a level of uncertainty associated with any conclusions drawn from malware forensics. This uncertainty can be used to raise a reasonable doubt about the true nature and intentions of malware. Furthermore, the complexity of the subject matter and the specialist skills required to study it (e.g., reverse engineering assembly language) may make the specialty less accessible to practitioners.

This perceived lack of scientific principles arguably also informs the methods used by practitioners to test software tools when attempting to evaluate the reproducibility of the results reported.

2.5 Reproducibility flaws

Techniques such as dual-tool verification are used by practitioners to “confirm result integrity during analysis” (Forensic control, 2011). To state that two observations “confirm,” a finding is a bold claim and little more than an example of repeated confirmation. It also fails to consider the possibility that both tools are incorrect and simply (erroneously) in agreement (Beckett & Slay, 2007). Hence use of Dual-tool verification in this way cannot confirm a result, but it can corroborate it on a statistically insignificant scale, identifying any discrepancies. An example of this arose in the trial of Anthony Casey (State of Florida v. Casey Marie Anthony, 2011) where a discrepancy was identified between two Internet history tools used to produce expert testimony.

2.6 Emerging statutory requirements

The importance of establishing quality standards for forensic science practice has become increasingly apparent in recent years. Interest in this has been expressed throughout the European Union (EU) by forensic institutions, the scientific community, as well as judicial and political stakeholders (van Ruth Smithuis, 2019).

Consequently, a European Council Framework Decision was passed requiring all member states to set up systems to accredit their forensic service providers carrying out laboratory activities. The UK’s response to this was to form the post of the Forensic Science Regulator (FSR). The FSR’s Codes (2020b) place an obligation on practitioners to gain accreditation that is mapped to the international standard ISO/IEC 17025 and “embed a systematic approach to quality” (Tully et al., 2020).

2.7 Summary

Given the reasons for the appointment of a FSR and the push to make the quality standards statutory, the issues identified currently undermine the trust that can be placed in findings tendered in criminal proceedings.

The production of digital evidence, therefore, requires the use of reliable tools and competent practitioners who use appropriate scientific language to instill the conditions for trusted practice, particularly when using tools to analyse malware as part of a digital forensic investigation.

Given malware forensics is an emerging field, there is a need to develop a scientific methodology to formalise the practice and hence underpin trusted practice in the field. In particular, a methodology to quantifiably evaluate tools used as part of a malware forensic investigation needs to be established. The next section will focus on identifying the requirements for such a methodology.

2. BACKGROUND AND RELATED WORK