Tutorial:Looping over Chains, Residues, Atoms in the System

From MSL-Libraries

This is an example on how to loop over all chains, residues and atoms with MSL, following the example program example_looping_over_Chain_Residues_Atoms.cpp in the examples/ subdirectory.

Complete source of example_looping_over_Chain_Residues_Atoms


Background

If an application requires access to chains, residues and atoms it should use the System, which has an internal hierarchical subdivision. The simpler AtomContainer is essentially simply an array of atoms lacking an internal representation of chains and residue and.

  • The System contains an array of Chain objects, identified by a chain id ("A", "B", "C", etc).
  • The Chain contains an array of Position objects, identified by residue numbers (1, 2, 3...) and, occasionally by a additional letter called insertion code (36, 37, 37A, 38, see the PDB site for a reference).
  • MSL supports protein design and allows multiple residue types (identities) to co-exist at a certain position (only one of them being active at a certain time). The Position therefore contains an array of Residue objects (ALA, LEU, VAL...)
  • The Residue contains an array of Atom objects (CA, CB, CG...). To be strictly correct, the Residue contains an array of AtomGroup objects, which contain Atom, but for most use the programmer does not need to be aware of this fact.

In this programs we will loop down all the chains in a system, and the positions in a chain, all the way to the atoms. In a first loop we will consider the identities. Given that in most applications there is only one identity in each position, a second loop is shown bypassing the identities.

To compile

% make bin/example_looping_over_Chain_Residues_Atoms

To run the program

Go to the main directory and run the command (note, the location of the exampleFiles subdirectory needs to be provided as an argument)

% bin/example_looping_over_Chain_Residues_Atoms exampleFiles/

Program description

  • Read a PDB file.
60	System container;
61	if (!container.readPdb(file)) {
		// reading failed, error handling code here
68	}
  • Print the sequence. This PDB happens to have 3 chains.
73	cout << container << endl;

which outputs:

A: {1}GLY LYS LYS MET VAL VAL ALA {8}LEU
B: {1}ASP LYS ASP PHE ALA SER GLU LYS LEU ALA GLU LEU {13}VAL
C: {1}ASP ALA LEU VAL ILE LEU {7}THR
  • In each object of the molecular hierarchy, the () operator returns elements of the next step down by index i (as a reference). The following lines demonstrate that.
92	Chain & chn = container(0); // get the first chain in the System (chain "A") by reference
93	Position & pos = chn(0);    // get the first position in the Chain (1) by reference
94	Residue & res = pos(0);     // get the first identity in the Position (ASP) by reference
95	Atom & atm = res(0);        // get the first atom in the Residue (N) by reference
  • The molecular objects can all be printed using the << operator.
97	cout << "First chain of System: " << chn << endl;
98	cout << "First position of the Chain: " << pos << endl;
99	cout << "First identity of the Position: " << res << endl;
100	cout << "First atom of the Residue: " << atm << endl;

which outputs:

First chain of System: A: {1}GLY LYS LYS MET VAL VAL ALA {8}LEU

First position of the Chain: A,1 [GLY]
First identity of the Position: A,1,GLY
First atom of the Residue: N    GLY    1  A [    23.784     49.695      6.481] (conf   1/  1) +
  • Let's now loop over the entire hierarchy. The various size() function tells us how many elements there are
109	cout << "The System has " << container.chainSize() << " chains" << endl;
110	// loop over each chain and get the number of positions
111	for (unsigned int i=0; i<container.chainSize(); i++) {
112		cout << "   Chain " << i << " (id: " << container(i).getChainId() << ") has " << container(i).positionSize() << " positions" << endl;
113
114		// loop over each position and get the number of identities
115		for (unsigned int j=0; j<container(i).positionSize(); j++) {
116			cout << "      Position " << j << " (id: " << container(i)(j).getPositionId() << ") has " << container(i)(j).identitySize() << " identity" << endl;
117
118			// loop over each identity and print the name
119			for (unsigned int k=0; k<container(i)(j).identitySize(); k++) {
120				cout << "         Identity " << k << " (id: " << container(i)(j)(k).getIdentityId() << ") is a " << container(i)(j)(k).getResidueName() << endl;
121
122				// loop over each atom and print the name
123				for (unsigned int l=0; l<container(i)(j)(k).atomSize(); l++) {
124					cout << "            Atom " << l << " (id: " << container(i)(j)(k)(l).getAtomOfIdentityId() << ") is a " << container(i)(j)(k)(l).getName() << endl;
125				}
126			}
127		}
128	}

which outputs:

The System has 3 chains
   Chain 0 (id: A) has 8 positions
      Position 0 (id: A,1) has 1 identity
         Identity 0 (id: A,1,GLY) is a GLY
            Atom 0 (id: A,1,GLY,N) is a N
            Atom 1 (id: A,1,GLY,CA) is a CA
            Atom 2 (id: A,1,GLY,C) is a C
            Atom 3 (id: A,1,GLY,O) is a O
      Position 1 (id: A,2) has 1 identity
         Identity 0 (id: A,2,LYS) is a LYS
            Atom 0 (id: A,2,LYS,N) is a N
            Atom 1 (id: A,2,LYS,CA) is a CA
...
  • The hierarchy is System>Chain>Position>Residue>Atom. However, in most cases each Position contains just one residue type, unless design or mutagenesis is involved. The above loop can be rewritten to skip the identities and go from Position to Atom. To do so we use the [] operator in Position. In all objects [] returns an atom (by reference).
138	Atom & aSys = container[0]; // get the first atom (N) of tye system by reference
139	Atom & aChn = chn[0];       // get the first atom (N) of the chain by reference
140	Atom & aPos = pos[0];       // get the first atom (N) of the position by reference
141	Atom & aRes = res[0];       // get the first atom (N) of the identity by reference
  • The total number of atoms is given by the function atomSize(), which is used by the Position in the loop System>Chain>Position>Atom
157	cout << "The System has " << container.chainSize() << " chains" << endl;
158	// loop over each chain and get the number of positions
159	for (unsigned int i=0; i<container.chainSize(); i++) {
160		cout << "   Chain " << i << " (id: " << container(i).getChainId() << ") has " << container(i).positionSize() << " positions" << endl;
161
162		// loop over each position and get the number of atoms
163		for (unsigned int j=0; j<container(i).positionSize(); j++) {
164			cout << "      Position " << j << " (id: " << container(i)(j).getPositionId() << ") is a " << container(i)(j).getResidueName() << " and has " << container(i)(j).atomSize() << " atoms" << endl;
165
166			// loop over each atom and print the name, note using atomSize instead of size
167			for (unsigned int k=0; k<container(i)(j).atomSize(); k++) {
168				// note using the [] operator
169				cout << "            Atom " << k << " (id: " << container(i)(j)[k].getAtomId() << ") is a " << container(i)(j)[k].getName() << endl;
170			}
171		}
172	}

which outputs:

The System has 3 chains
   Chain 0 (id: A) has 8 positions
      Position 0 (id: A,1) is a GLY and has 4 atoms
            Atom 0 (id: A,1,N) is a N
            Atom 1 (id: A,1,CA) is a CA
            Atom 2 (id: A,1,C) is a C

Back to the tutorial page