Reading and processing PDB files

Reading PDB files into a BioShell program is divided into two steps:

  • loading a text file into memory, and
  • parsing its content and creating Structure object(s)

Loading a PDB file

You have to create a reader object to read a PDB file. In the simplest case this looks as below:

core::data::io::Pdb reader("infile.pdb");

This reader will skip water molecules and hydrogen atoms. You can control which PDB line will be omitted during reading by providing a PdbLineFilter instance to the constructor, e.g.

core::data::io::Pdb reader("infile.pdb",
  core::data::io::all_true(core::data::io::is_not_water,
  core::data::io::is_not_alternative));

PdbLineFilter objects can dramaticly limit the number of PDB lines to be parsed and thus shorten the time spent of PDB file loading.

Creating Structure object

Once a file is loaded, you can create a Structure object from one of its models:

core::data::structural::Structure_SP model = reader.create_structure(0);

The very first model is indexed by 0. Every time create_structure() method is called, a new Structure object is created, which includes necessary memory allocation. Creating new atom objects is in fact the slowest part of this call. Sometimes it is possible to recycle old structure filling it with new coordinates rather than just creating a new one from scratch. This can be done as in the ap_contact_map program; the relevant fragment is shown below:

1
2
3
4
5
6
7
8
  core::data::io::Pdb reader(argv[2],filter); // --- file name (PDB format, may be gzip-ped)

  core::data::structural::Structure_SP structure = reader.create_structure(0);
  core::calc::structural::ContactMap cmap(*structure,cutoff,selector);
  for (int i_model = 1; i_model < reader.count_models(); ++i_model) {
    reader.fill_structure(i_model,*structure);
    cmap.add(*structure);
  }

Coordinates of a new structure must fit into the existing stucture i.e. the new structure must be composed of the same number of chains, residues and atom as the old one. In practice this is most useful when a multi-model PDB file must be loaded, as in this example:

  • in the line 1 a PDB file is loaded with a filter instance defined someehere before
  • in the line 3 a Structure object is creaded based on the first model defined in the file
  • in the line 4 a ContactMap object is creaded and the first structure is loaded id
  • finally, in lines 5-8 a loop iterates over all the remaining models; in line 6 coordinates of each model are loaded into the existing structure (the one created in line 3)

Residue, PdbAtom and Chain objects are created only once, when the structure at index 0 is loaded. After that the loop only substitutes. coordinates of this structure