Tutorial:Selecting subsets of atoms
From MSL-Libraries
Jump to navigationJump to search
NOTE: TUTORIAL WRITING IN PROGRESS DO NOT FOLLOW THIS
This is an example on how to select atoms and residues with MSL, following the example program example_selecting_atoms_and_residues.cpp in the examples/ subdirectory.
Complete source of example_selecting_atoms_and_residues
Background
To compile
% make bin/example_selecting_atoms_and_residues
To run the program
Go to the main directory and run the command (note, the location of the exampleFiles subdirectory needs to be provided as an argument)
% bin/example_selecting_atoms_and_residues exampleFiles
Program description
- Read a PDB file.
60 System container;
61 if (!container.readPdb(file)) {
// reading failed, error handling code here
68 }
- Print the sequence. This PDB happens to have 3 chains.
73 cout << container << endl;
which outputs:
A: {1}GLY LYS LYS MET VAL VAL ALA {8}LEU
B: {1}ASP LYS ASP PHE ALA SER GLU LYS LEU ALA GLU LEU {13}VAL
C: {1}ASP ALA LEU VAL ILE LEU {7}THR
- In each object of the molecular hierarchy, the () operator returns elements of the next step down by index i (as a reference). The following lines demonstrate that.
92 Chain & chn = container(0); // get the first chain in the System (chain "A") by reference
93 Position & pos = chn(0); // get the first position in the Chain (1) by reference
94 Residue & res = pos(0); // get the first identity in the Position (ASP) by reference
95 Atom & atm = res(0); // get the first atom in the Residue (N) by reference
- The molecular objects can all be printed using the << operator.
97 cout << "First chain of System: " << chn << endl;
98 cout << "First position of the Chain: " << pos << endl;
99 cout << "First identity of the Position: " << res << endl;
100 cout << "First atom of the Residue: " << atm << endl;
which outputs:
First chain of System: A: {1}GLY LYS LYS MET VAL VAL ALA {8}LEU
First position of the Chain: A,1 [GLY]
First identity of the Position: A,1,GLY
First atom of the Residue: N GLY 1 A [ 23.784 49.695 6.481] (conf 1/ 1) +
- Let's now loop over the entire hierarchy. The size() function tells us how many elements there are
109 cout << "The System has " << container.size() << " chains" << endl;
110 // loop over each chain and get the number of positions
111 for (unsigned int i=0; i<container.size(); i++) {
112 cout << " Chain " << i << " (id: " << container(i).getChainId() << ") has " << container(i).size() << " positions" << endl;
113
114 // loop over each position and get the number of identities
115 for (unsigned int j=0; j<container(i).size(); j++) {
116 cout << " Position " << j << " (id: " << container(i)(j).getPositionId() << ") has " << container(i)(j).size() << " identity" << endl;
117
118 // loop over each identity and print the name
119 for (unsigned int k=0; k<container(i)(j).size(); k++) {
120 cout << " Identity " << k << " (id: " << container(i)(j)(k).getIdentityId() << ") is a " << container(i)(j)(k).getResidueName() << endl;
121
122 // loop over each atom and print the name
123 for (unsigned int l=0; l<container(i)(j)(k).size(); l++) {
124 cout << " Atom " << l << " (id: " << container(i)(j)(k)(l).getAtomOfIdentityId() << ") is a " << container(i)(j)(k)(l).getName() << endl;
125 }
126 }
127 }
128 }
which outputs:
The System has 3 chains
Chain 0 (id: A) has 8 positions
Position 0 (id: A,1) has 1 identity
Identity 0 (id: A,1,GLY) is a GLY
Atom 0 (id: A,1,GLY,N) is a N
Atom 1 (id: A,1,GLY,CA) is a CA
Atom 2 (id: A,1,GLY,C) is a C
Atom 3 (id: A,1,GLY,O) is a O
Position 1 (id: A,2) has 1 identity
Identity 0 (id: A,2,LYS) is a LYS
Atom 0 (id: A,2,LYS,N) is a N
Atom 1 (id: A,2,LYS,CA) is a CA
...
- The hierarchy is System>Chain>Position>Residue>Atom. However, in most cases each Position contains just one residue type, unless design or mutagenesis is involved. The above loop can be rewritten to skip the identities and go from Position to Atom. To do so we use the [] operator in Position. In all objects [] returns an atom (by reference).
138 Atom & aSys = container[0]; // get the first atom (N) of tye system by reference
139 Atom & aChn = chn[0]; // get the first atom (N) of the chain by reference
140 Atom & aPos = pos[0]; // get the first atom (N) of the position by reference
141 Atom & aRes = res[0]; // get the first atom (N) of the identity by reference
- The total number of atoms is given by the function atomSize(), which is used by the Position in the loop System>Chain>Position>Atom
157 cout << "The System has " << container.size() << " chains" << endl;
158 // loop over each chain and get the number of positions
159 for (unsigned int i=0; i<container.size(); i++) {
160 cout << " Chain " << i << " (id: " << container(i).getChainId() << ") has " << container(i).size() << " positions" << endl;
161
162 // loop over each position and get the number of atoms
163 for (unsigned int j=0; j<container(i).size(); j++) {
164 cout << " Position " << j << " (id: " << container(i)(j).getPositionId() << ") is a " << container(i)(j).getResidueName() << " and has " << container(i)(j).atomSize() << " atoms" << endl;
165
166 // loop over each atom and print the name, note using atomSize instead of size
167 for (unsigned int k=0; k<container(i)(j).atomSize(); k++) {
168 // note using the [] operator
169 cout << " Atom " << k << " (id: " << container(i)(j)[k].getAtomId() << ") is a " << container(i)(j)[k].getName() << endl;
170 }
171 }
172 }
which outputs:
The System has 3 chains
Chain 0 (id: A) has 8 positions
Position 0 (id: A,1) is a GLY and has 4 atoms
Atom 0 (id: A,1,N) is a N
Atom 1 (id: A,1,CA) is a CA
Atom 2 (id: A,1,C) is a C