DISCUSSION

5. DISCUSSION

5. DISCUSSION

The discussion below reviews the framework requirements, which are reproduced in Table 4, together with a summary of the discussion points below.

The majority of the requirements are met by the design of the framework; for example, by handling the malware binaries via scripts on a closed network, access to the malware is restricted and minimises any accidental cross contamination during testing (Requirement 1). In addition, the use of virtual machines provides a platform for rapidly testing and resetting test environments, addressing Requirement 10, while the use of an Internet simulator addresses Requirement 11 to provide a network service enabled environment conducive to executing malware. Similarly, the framework enables the use of a variety of operating systems, particularly those that are deemed more vulnerable, such as Windows XP (Requirement 12).

The design of the framework also dispenses with the need to have knowledge of the internal operation of a tool under test, reflecting the real-world practice where practitioners use closed source tools. Instead, the framework provides a means to test the expected output of a tool under test (Requirement 13).

However, there are also some requirements (Requirements 2, 5, and 6) that can only be fulfilled after the work is released to the community for review and consideration of use in practice. As a result, there is scope for further work to evaluate these (see Further Work in the next section).

The provision for reliability (Requirement 3) is comprised of the four elements of reliability, as defined by R v Lundy ([2013] UKPC 28), see Table 4. For simplicity, these will hereafter be referred to as the ‘Lundy requirements’ and identified individually as ‘L1’ through to ‘L4’, respectively.

Figure 1. MATEF components.

These four elements are summarised in Table 4 against Requirement 3. Two of these four have been addressed by the design of the framework and are indicated by check marks ‘X’ in Table 4. First, the capability for testing a technique (see Requirement 3, L1 in Table 4) is addressed by the design of the framework. Secondly, the rate of error associated with a given software tool observing malware artefacts can be determined through repeated measurements and the use of statistical techniques (see Requirement 3, L3 in Table 4). The remaining two elements of the Lundy requirements in Table 4 (L2 and L4) were not addressed and so are marked with a cross ‘X’ in Requirement 3 of Table 4. These unaddressed requirements mirror Requirements 5 and 6 of the framework discussed above.

By publishing the hashes of malware samples sourced from openly shareable resources such as VirusTotal (2010) and using these during the testing of tools, practitioners are able to collaboratively test their tools against the same known and trusted datasets (Requirement 9). Similarly, practitioners can use the same independent source to determine the expected number of artefacts that are generated for a given malware binary (Requirement 14). The design of the framework also allows for these artefacts to be recorded and counted from a disparate range of tools that are subject to testing (Requirement 15).

However, the use of an independent source is not without its problems. Critics of the framework will point to how the requirements for validation (Requirements 4 and 7) have yet to be fully addressed, given the dependence on a third-party tool to provide ‘ground truth.’ The approach of comparing the results of one tool with that of another (online) one is little more than dual-tool verification.

Despite this, the framework’s test environment provides a means to test tools under different conditions repeatedly at scale on large numbers of malware binaries. This enables statistical techniques to be applied and thus establish greater confidence in an observed value to a statistically significant degree. This capability facilitates the response to the requirement to provide an estimate for uncertainty (Requirement 8).

Despite the ability to control the frequency and conditions under which the malware is executed locally, it should be noted that the use of online sandboxes to quantify the number of expected artefacts for a given category (i.e., creation, change, or deletion) has one notable limitation: rather than providing a representative average quantity, online sandbox tools may only execute samples once and for no more than a maximum time duration before terminating (Bayer, Habibi, Balzarotti, Kirda, Kruegel, 2009). However, the limitation is attributed to the use of online sandboxes and not of the framework proposed here. The use of online sandboxes was one of convenience, speed, and choice (given the variety of online tools available). Additionally, further work is possible here to integrate an offline sandbox, which is under the control of the investigator, to have greater control in how ‘ground truth’ is established. We discuss this in the next section.

5. DISCUSSION

Table of Contents