A new algorithm called FLSHclust, recently developed by the Broad Institute of MIT and Harvard University, discovered 188 rare and previously unknown CRISPR-connected gene modules among billions of protein sequences, including new Type VII CRISPR-Cas system. The new findings provide new opportunities to exploit CRISPR systems and understand the functional diversity of microbial proteins.
CRISPR systems have been used to develop an increasing number of novel biomolecular methods, including the well-known CRISPR/Cas-mediated genome editing. The emergence of the previously unknown CRISPR system will promote the further development of these biotechnologies.
However, the CRISPR toolbox has been expanded by protein sequence databases. But commonly used algorithms are impractical when mining exponentially growing data sets containing billions of proteins.
To address this limitation, the research team developed the FLSHclust algorithm, an algorithm for clustering proteins by sequence similarity that, unlike currently available methods, can quickly and efficiently analyze large protein sequence databases.
Using the new algorithm, the team searched for rare CRISPR systems in a metagenomic database containing 8 billion proteins and 10.2 million CRISPR arrays, discovering 188 previously unknown CRISPR-related genes and identifying and characterizing a new class of CRISPR-containing genes. Systemic Cas-14 (i.e. type VII), which acts on RNA.
The newly discovered system is very rare. The researchers said that the discovery of this previously unknown Cas gene and CRISPR system has greatly expanded the diversity of CRISPR, revealing the unprecedented flexibility and modularity of the organization and function of the CRISPR system. It also shows that most mutations are rare.
In recent years, CRISPR-Cas9 gene editing technology has been rapidly promoted and applied in life sciences and other fields. It is low-cost, simple and easy to use, and has become a powerful assistant for scientists in biological experiments. Not only that, the technology itself has become a hot research topic that has attracted much attention in the field of life sciences. It should be noted that although CRISPR-Cas9 gene editing technology is very easy to use, it is not perfect. Therefore, the discovery of more CRISPR-Cas systems enriches the “toolbox” of gene editing technology, provides more options for life science research, and is also expected to promote the continued iterative upgrade of gene editing technology.
(Original title: "New algorithm reveals rare CRISPR gene modules, expected to lead to safer and more effective genome therapies")