The KnotProt database collects information about topologically non-trivial proteins, i.e. proteins with so-called slipknots and knots. This is the first database that classifies proteins with slipknots and knots, represents their entire complexity in the form of a “knotting fingerprint” , and presents many biological and geometrical statistics based on these results. The KnotProt database is based on protein chains deposited in Protein Data Bank (PDB) and contains around 900 protein chains with knots or slipknots.
The first examples of knots in proteins were found in 1994 , and many more have been identified in recent years [3, 4, 5]. Proteins may also form slipknots, i.e. contain knotted subchains even though their backbone chain as a whole is unknotted ; they were discovered in proteins in 2007  and their systematic analysis was initiated in . Usually it is impossible to determine, by a naked eye, if a given protein chain forms a knot or a slipknot. Therefore, more involved mathematical tools, such as polynomial knot invariants, are used to detect knotting and slipknotting. Much effort has been invested into identifying knotted proteins among those deposited in PDB.
Recently considerable interest arose around this subject for a variety of reasons. First, it is believed that the presence of knots and slipknots in proteins is not accidental and therefore understanding their function is an important challenge. Second, recent work shows nearly perfect conservation of knotting fingerprints in some families whose members differ by hundreds of millions years of evolution (arising from distant organisms) and possess a low sequence identity . Moreover, based on knotting fingerprints, it was shown that the locations of active sites in proteins are correlated with points characterizing their topology (e.g. positions of the knot core) . These findings imply that a detailed representation of protein topology can be crucial for understanding their biological role. The KnotProt database will make knotting and slipknotting data easily available and should help researchers to understand biological reasons of protein knotting.
The database KnotProt contains detailed information about the entanglement in proteins and presents it in the form of “knotting fingerprint”. The knotting fingerprint encodes information about the knot type of each subchain of a protein backbone and represents it in the form of a matrix diagram, see Fig. 1 and the detailed description in “Materials and methods”. The KnotProt database also presents extensive statistics about proteins with knots and slipknots based on their biological function, molecular tags, families association, type of fold, as well as geometric data: knotting patterns, knot and slipknot lengths and depths, etc. Interestingly, the KnotProt analysis reveals that proteins with knots and slipknots can be classified into a few distinct topological motifs, represented by particular patterns within the matrix diagrams. This data can be used, for example, to find proteins with knots or slipknots with a given homological sequence, a similar structure, or performing a particular biological function. As an additional feature, a user can analyze structures and generate knotting fingerprints of uploaded proteins. It is also possible to upload and analyze a whole set of structures (e.g. analyze the evolution of a knot along a folding or unfolding trajectory). The KnotProt database is automatically updated every Wednesday, immediately after new structures are deposited in PDB.
There are three main options users can choose from to view or analyze data:
After choosing or uploading some particular protein (or any other polymer chain), its knotting fingerprint is presented in the main window of the database. There are four options to choose from, displayed in blue in top of the window. The first option is “Knotting data”, which shows topological structure of knots and slipknots, as described in detail in the “Knotting data” section. Other options (“Chain information summary”, “Similar chains” (by sequence), and “Similar chains” (by structure)”) display useful information about proteins based in other biological databases (pubmed, doi, rcsb, pfam); the key feature of this subpage is an automated classification of proteins based on the same knotting fingerprint, sequence similarity, structure and family origin.
The database contains almost every protein chain deposited in the pdb - redundant chains within particular pdb entry of homomultimeric complex are represented by one chain only. New pdb entries are checked each week.
We included non-X-ray entries and entries with Cα-only entries. Chains were subsequently evaluated to take into account insertions in these sequences of all non-typical aminoacids: MSE, FGL, LLP, SAC, SER, PCA, MEN, CSB, HTR, PTR, TYR, SCE, M3L, OCS, KCX, SEB, MLY, CSW, TPO, SEP, AYA, TRN. This analysis is performed so as not to introduce additional breaks along protein chain. In case of NMR structures, we took the first model with a given chain name. Out of those chains we identified around 1200 chains that possess either a knot or a slipknot. These currently comprise the KnotProt database.
The knotting fingerprint takes into account missing atoms, and in case they overlap with the knot core, they are represented by grey strips as in the figure below. In this case the missing part of the chain is replaced by a line segment. This may affect the type of knot detected, so one should be careful in interpreting results in such cases. Missing atoms in the chain are denoted in sequence representation in the "Knotting data" screen as ‘’-”.