Examples by functionality

File processing

BioShell supports the following file formats, holding bioinformatics data:

  • PDB
  • CIF
  • ALN (ClustalW output with multiple sequence alignment)
  • HHPred output [1]
  • PIR
  • XML (most notably these produced by blast+)
  • SS2 (PsiPred output that holds secondary structure with predicted probabilities)
  • CHK (legacy blast profiles, binary files)
  • MAT (PSSM files produced by PsiBlast that contains PSSM)

BioShell offers reading and processing processing these files, which includes substructure extraction, format convertion and data filtering.


Sequence alignment and multiple sequence alignment calculations inludes Smith & Waterman [2] and Needleman & Wunsh [3], both available in \(O(N^2)\) and \(O(N^3)\) implementations. These algotirhms are implemented as C++ templates, which facilitates alignment of virtually any kind of data, assuming that the appropriate scoring method is provided.

Sequence calculations

BioShell can calculate protein pI as well as hydrophobicity according to several scales. Creates, writes and handles sequence profiles. It can also convert an amino acid sequence to one of over 16 reduced alphabets [4] obtained from teh work by Peterson at al. [5].

Structure calculations

Since its origing, the main role of BioShell were structure-based calculations. The package can calculate a very broad selection of structural parameters, including:

  • distances and distance maps
  • contacts and contact maps
  • hydrogen bonds
  • dihedral angle by name (e.g. Phi or Chi1) or based on arbitrary atoms
  • structural superimpositions (Kabsh algorithm) and rmsd value on arbitrary set of atoms
  • structure similarity measures such as: GDT, LGS and TM-score

Statistical & numerical analysis

This includes:

  • hierarchical agglomerative clustering with arbitrary distance and four merging scenarios: Single Link, Complete Link, Average Link and Ward’s method
  • spline approximation
  • kernel density estimation
  • expectation-maximization
  • simple non-parametric statistics such as mean, variance, bootstrap estimation, robust estimation


[1]Soding, J and Biegert, A and Lupas, A. N., “The HHpred interactive server for protein homology detection and structure prediction.” Nucleic acids research (2005) 33 W244–W248
    1. Smith, and M. S. Waterman, JMB 147.1 (1981): 195-197
    1. Needleman, and C. D. Wunsch, JMB 48.3 (1970)
    1. Murphy, A. Wallqvist, R. M. Levy. (2000) “Simplified amino acid alphabets for protein fold recognition and implications for folding”. Protein Eng. 13(3):149-152
[5]Peterson, Kondev, Theriot and Phillips. “Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment”. Bioinformatics 2009 25:1356-1362

This file has been automatically generated on Apr 05 2020 17:01:41