In the field of digital forensics, the number and heterogeneity of devices typically involved in an investigation is increasing. In order to train digital forensics practitioners and make faster progress in the development and validation of forensic tools, the demand for up-to-date data sets is high. However, manually creating data sets is a complex, tedious, and time-consuming task increasing the need for automated solutions. Existing data generation frameworks typically use components that run directly on the simulated client (e.g., a client-side agent controlled via SSH). On the one hand, this facilitates simulation by providing direct feedback from the client and the ability to use client-side libraries to access software. On the other hand, however, this approach creates unintended traces in the generated data sets that quickly reveal their synthetic origin and affect their realism and thus their relevance. To avoid such traces, this paper presents a hypervisor-based solution to eliminate such a client-side software component in a recent digital forensic data set generator, while compensating for its absence only through host-side means. To demonstrate the practicability of the proposed approach as well as the indistinguishability of the generated traces, a multi-participant scenario is performed as a proof of concept to replicate a realistic attack scenario on a Linux system from a Kali attacker machine. During the evaluation, the generated data set is compared in terms of unintended traces and realism to a data set generated by the same framework using an agent component. In this way, we demonstrate the benefits and overall usefulness of an agent-less data synthesis approach
«In the field of digital forensics, the number and heterogeneity of devices typically involved in an investigation is increasing. In order to train digital forensics practitioners and make faster progress in the development and validation of forensic tools, the demand for up-to-date data sets is high. However, manually creating data sets is a complex, tedious, and time-consuming task increasing the need for automated solutions. Existing data generation frameworks typically use components that run...
»