visualization
, sequence
,
quality control
, topology
, mapping
,
annotation
, connectivity
,
edge extraction
, and feature
.
tmkit
(please see
how to install it), you can import this library by
putting the following code in a Python script or a Jupyter notebook.
Then, you can access the 14 modules covering 9 function classes.
import tmkit as tmk
# | Tool module | Function class | Note |
---|---|---|---|
1 | tmk.fetch |
Quality control | fetch example data |
2 | tmk.qc |
Quality control | generate and extract metrics of sequences and structures |
3 | tmk.seq |
Sequence | parse sequences and structures |
4 | tmk.msa |
Sequence | produce commands for generating multiple sequence alignment |
5 | tmk.feature |
Feature | protein biological features |
6 | tmk.collate |
Mapping | seek difference between RCSB and PDBTM structures |
7 | tmk.topo |
Topology | transmembrane protein topologies |
8 | tmk.rrc |
Feature | performance evaluation of residue contact prediction |
9 | tmk.ppi |
Connectivity | protein connectivity |
10 | tmk.mut |
Annotation | transmembrane protein's mutation data processing |
11 | tmk.vs |
Visualization | visualize protein structures |
12 | tmk.cath |
Annotation | access protein domains and families |
13 | tmk.mapping |
Mapping | conversion between protein identifiers |
14 | tmk.edge |
Edge extraction | rewiring of connections between residues |
Identification of protein-protein interaction (PPI) interfaces of proteins is critical to understand the biological processes governed by them.
The sequence pre-processing module is a fundamental component of TMKit, designed to handle sequence reading in diverse formats, sequence retrieval from various sources, and multiple sequence alignment (MSA) generation.
This module evaluates various criteria, including the experimentation methods used, resolution, subclass, and sequence length, to qualify proteins in bulk.
TMKit can be used to obtain more detailed non-TM topologies, that is, side 1, side 2, strand, coil, inside, loop, and interfacial. Besides the structure-derived topologies, TMKit also supplies predicted topologies by embedding TMHMM and Phobius running on the command line interface (CLI) and within Python
Identifier mapping between structural and sequence data (e.g., FASTA residue IDs and PDB residue IDs) is an important technical premise to guarantee the correct interpretation of biological findings.
Amino acid residues of transmembrane proteins to be involved in mutations and function domains can be annotated through the MutHTP, Pred-MutHTP and CATH databases.
Studying connections of a protein to others in a PPI network is of crucial importance to understand its biological role.
We provide a high-performance computing library for extracting connections between residues by constructing bipartite and unipartite graphs (where residue connections are treated as edges) and assigning features in linear time with respect to the number of residues used.
A set of transmembrane protein-specific and general-purpose features is provided by TMKit in support of machine learning modelling.