Download PDFOpen PDF in browserAdapting Malware Detection to DNA ScreeningEasyChair Preprint 90452 pages•Date: October 11, 2022AbstractAs DNA synthesis becomes cheaper and more accessible, there is a corresponding increase in opportunities for synthesis of dangerous pathogenic sequences by either malicious or careless actors. To mitigate this threat, major DNA synthesis providers screen sequence orders for pathogenic content, following guidance from the US Department of Health and Human Services and the International Genome Synthesis Consortium. Current methods for screening, however, have been un- able to scale sufficiently to keep up. The current dominant method for screening is to evaluate sequence homology, using BLAST (or similar) to test if the sequence’s best alignment is with a controlled pathogenic organism. This approach produces a high rate of false positives, estimated at more than 4% from a survey of IGSC member companies, worsened by the fact that these methods generally search for all genes in an organism, including harmless “housekeeping” genes and others that have no functional relationship to pathogenesis. Moreover, the rate of false positives increases markedly as sequence length shortens. Due to the cost of resolving false positives, synthesis providers thus typically only screen dsDNA sequences that are at least 200 bp long and do not screen oligonucleotides at all. We hypothesized that these challenges could be addressed by adapting methods for detection of malware in network traffic, which faces even greater challenges of scale. To this end, we adapted the Framework for Autogenerated Signature Technology (FAST) signature extraction method for use with nucleic acid sequences, producing the FAST for Nucleic Acids (FAST-NA) method for DNA screening. Our resulting implementation of FAST-NA is able to detect DNA sequences far faster than BLAST-based methods, and with equivalent sensitivity and significantly improved specificity, even while reducing the minimum scanning window from 200bp to 50bp. Keyphrases: DNA screening, cyberbiosecurity, synthesis
|