Tutorial:Using regular expression: Selecting sequence motifs of a Chain

From MSL-Libraries

This is an example on how to select sequence motifs from Chain objects. A Chain object versus a System object is used because regular expressions can not span across chains.

Complete source of example_regular_expressions.cpp

In MSL the program grepSequence utilizes the type of code in this tutorial to search for sequences in a list of PDB files, and then structurally align them.

To compile

% make bin/example_regular_expresssions

To run the program

Go to the main directory and run the command (note, the location of the exampleFiles subdirectory needs to be provided as an argument)

% bin/example_regular_expressions exampleFiles/example0004.pdb

Program description

Read in structure into System object, check that chain "A" exists.

      	string file = "example0004.pdb";
	file = (string)argv[1] + "/" + file;
	cout << "Create an AtomContainer and read the atoms from " << file << endl;

	System sys;
	if (!sys.readPdb(file)) {
	  // reading failed, error handling code here
	  cerr << "ERROR could not read in "<<file<<endl;
	// Check to make sure chain A exits in sys
	if (!sys.chainExists("A")){
	  // error code here.
	  cerr << "ERROR chain A does not exist in file "<<file<<endl;
	// Get a Chain object
	Chain &ch = sys.getChain("A");

Setup a regular expression object (RegEx) and a regular expression string to match 2 Valines followed by Isoleucine and then a Leucine. The RegEx match gets residue (or position) indices into the parent System object.

	// Regular Expression Object
	RegEx re;
	// Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other
	string regex = "V{2}IL";
	// Now do a sequence search...
	vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex);
	// Loop over each match.
	for (uint m = 0; m < matchingResidueIndices.size();m++){
	  // Loop over each residue for this match
	  int match = 1;
	  for (uint r = matchingResidueIndices[m].first; r <= matchingResidueIndices[m].second;r++){
	    // Get the residue
	    Residue &res = ch.getResidue(r);
	    // .. do something cool with matched residues ...
	    cout << "MATCH("<<match<<"):  RESIDUE: "<<res.toString()<<endl;

Back to the tutorial page