# ap_SequenceWeightingProtocol¶

ap_SequenceWeightingProtocol reads a set of protein sequences and computes a real weight for each of those sequences.

If the FASTA file is the input, every pair of sequences will be aligned and sequence identity values will be evaluated based on these alignments. If .aln is the input (i.e. ClustalO MSA file format), it is assumed the sequences are already aligned and sequence identity values will be computed based on the MSA.

Sequence identity values will be transformed into real weights. These weights may be further used e.g. in sequence profile construction

USAGE:
ap_SequenceWeightingProtocol input.fasta
ap_SequenceWeightingProtocol input.aln


## Categories:¶

• core/protocols/SequenceWeightingProtocol

## Program source:¶

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  #include #include #include #include #include #include #include std::string program_info = R"( ap_SequenceWeightingProtocol reads a set of protein sequences and computes a real weight for each of those sequences. If the FASTA file is the input, every pair of sequences will be aligned and sequence identity values will be evaluated based on these alignments. If .aln is the input (i.e. ClustalO MSA file format), it is assumed the sequences are already aligned and sequence identity values will be computed based on the MSA. Sequence identity values will be transformed into real weights. These weights may be further used e.g. in sequence profile construction USAGE: ap_SequenceWeightingProtocol input.fasta ap_SequenceWeightingProtocol input.aln )"; /** @brief Shows how to use SequenceWeightingProtocol class * * CATEGORIES: core/protocols/SequenceWeightingProtocol * KEYWORDS: FASTA input; sequence alignment; sequence identity; sequence weighting * GROUP: Sequence calculations; */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::sequence; using namespace core::protocols; bool if_align = true; std::vector input_sequences; auto root_extn = utils::root_extension(argv[1]); if ((root_extn.second == "aln") || (root_extn.second == "clustalw")) { core::data::io::read_clustalw_file(argv[1], input_sequences); if_align = false; } else core::data::io::read_fasta_file(argv[1], input_sequences); core::protocols::SequenceWeightingProtocol protocol; protocol.seq_identity_cutoff(0.25).n_threads(1); protocol.if_align_sequences(if_align).add_input_sequences(input_sequences); auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! protocol.run(); auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration time_span = std::chrono::duration_cast>(end - start); std::cerr << input_sequences.size() * (input_sequences.size() - 1) / 2.0 << " sequence similarities calculated within " << time_span.count() << " [s]\n"; protocol.print_weights(std::cout); }