ap_pdb_to_fasta_ss

Reads a PDB file and writes protein sequence(s) in FASTA format.

The program also writes secondary structure in FASTA format, if this data is available from PDB headers. The sequence comprise only these amino acid residues which have C-alpha atom User can select a chain by providing its code as the second argument of the program. The program also writes PDB file that corresponds to the sequence.

USAGE:
ap_pdb_to_fasta_ss 5edw.pdb A

Categories:

  • core::data::io::Pdb; core::algorithms::Not; core::data::sequence::SecondaryStructure

Input files:

Program source:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <iostream>

#include <core/algorithms/predicates.hh>
#include <core/data/io/Pdb.hh>
#include <core/data/io/fasta_io.hh>
#include <core/data/structural/structure_selectors.hh>
#include <utils/exit.hh>

std::string program_info = R"(

Reads a PDB file and writes protein sequence(s) in FASTA format.

The program also writes secondary structure in FASTA format, if this data is available from PDB headers.
The sequence comprise only these amino acid residues which have C-alpha atom
User can select a chain by providing its code as the second argument of the program. The program also writes PDB file
that corresponds to the sequence.

USAGE:
    ap_pdb_to_fasta_ss 5edw.pdb A

)";

/** @brief Reads a PDB file and writes protein sequence(s) in FASTA format.
 *
 * The program also writes secondary structure in FASTA format, if this data is available from PDB headers.
 * User can select a chain by providing its code as the second argument of the program
 * USAGE:
 *     ap_pdb_to_fasta_ss 5edw.pdb A
 *
 * CATEGORIES: core::data::io::Pdb; core::algorithms::Not; core::data::sequence::SecondaryStructure
 * KEYWORDS:   PDB input; FASTA output; secondary structure; predicates
 */
int main(const int argc, const char* argv[]) {

  if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter

  using namespace core::data::io; // Pdb and create_fasta_string lives there
  using namespace core::data::structural; // Chain and

  Pdb reader(argv[1],is_not_alternative,true);
  Structure_SP strctr = reader.create_structure(0);

  // Iterate over all chains
  for (auto it_chain = strctr->begin(); it_chain!=strctr->end(); ++it_chain) {
    Chain & c = **it_chain; // --- dereference iterator for easier access
    if ((argc > 2) && ((*it_chain)->id() != argv[2][0])) continue;

    // --- The line below uses STL algorithm with BioShell predicate to remove all the residues lacking c-alpha
    c.erase(std::remove_if(c.begin(), c.end(), core::algorithms::Not<ResidueHasCA>(ResidueHasCA())), c.end());

    if(c.size()>0) {
      // --- Create a sequence object (including secondary structure information)
      core::data::sequence::SecondaryStructure_SP s = (*it_chain)->create_sequence();
      // --- Write sequence as FASTA
      std::cout << create_fasta_string(*s) << "\n";
      // --- Write secondary structure as FASTA
      std::cout << create_fasta_secondary_string(*s) << "\n";
      // --- Write PDB
      std::ofstream out(strctr->code()+c.id()+".pdb");
      for (auto it = c.first_atom(); it != c.last_atom(); ++it)out << (*it)->to_pdb_line() << "\n";
      out.close();
    }
  }
}
../_images/file_icon.png