Tutorial:Using regular expression: Selecting sequence motifs of a Chain
This is an example on how to select sequence motifs from Chain objects. A Chain object versus a System object is used because regular expressions can not span across chains.
Complete source of example_regular_expressions.cpp
In MSL the program grepSequence utilizes the type of code in this tutorial to search for sequences in a list of PDB files, and then structurally align them.
To compile
% make bin/example_regular_expresssions
To run the program
Go to the main directory and run the command (note, the location of the exampleFiles subdirectory needs to be provided as an argument)
% bin/example_regular_expressions exampleFiles/example0004.pdb
Program description
Read in structure into System object, check that chain "A" exists.
string file = "example0004.pdb";
file = (string)argv[1] + "/" + file;
cout << "Create an AtomContainer and read the atoms from " << file << endl;
System sys;
if (!sys.readPdb(file)) {
// reading failed, error handling code here
cerr << "ERROR could not read in "<<file<<endl;
exit(0);
}
// Check to make sure chain A exits in sys
if (!sys.chainExists("A")){
// error code here.
cerr << "ERROR chain A does not exist in file "<<file<<endl;
exit(0);
}
// Get a Chain object
Chain &ch = sys.getChain("A");
Setup a regular expression object (RegEx) and a regular expression string to match 2 Valines followed by Isoleucine and then a Leucine. The RegEx match gets residue (or position) indices into the parent System object.
// Regular Expression Object
RegEx re;
// Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other
string regex = "V{2}IL";
// Now do a sequence search...
vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex);
// Loop over each match.
for (uint m = 0; m < matchingResidueIndices.size();m++){
// Loop over each residue for this match
int match = 1;
for (uint r = matchingResidueIndices[m].first; r <= matchingResidueIndices[m].second;r++){
// Get the residue
Residue &res = ch.getResidue(r);
// .. do something cool with matched residues ...
cout << "MATCH("<<match<<"): RESIDUE: "<<res.toString()<<endl;
}
}