Difference between revisions of "Tutorial:Using regular expression: Selecting sequence motifs of a Chain"

From MSL-Libraries
Jump to navigationJump to search
Line 22: Line 22:
 
=== Program description ===
 
=== Program description ===
  
 +
Read in structure into System object, check that chain "A" exists.
 
<source lang="cpp">
 
<source lang="cpp">
      System sys;
+
      string file = "example0004.pdb";
      if (!sys.readPdb(file)) {
+
file = (string)argv[1] + "/" + file;
// reading failed, error handling code here
+
cout << "Create an AtomContainer and read the atoms from " << file << endl;
      }
 
     
 
      // Check to make sure chain A exits in sys
 
      if (!sys.exists("A")){
 
            // error code here.
 
      }
 
  
      // Get a Chain object
+
System sys;
      Chain &ch = sys.getChain("A");
+
if (!sys.readPdb(file)) {
 +
  // reading failed, error handling code here
 +
  cerr << "ERROR could not read in "<<file<<endl;
 +
  exit(0);
 +
}
 +
 +
// Check to make sure chain A exits in sys
 +
if (!sys.chainExists("A")){
 +
  // error code here.
 +
  cerr << "ERROR chain A does not exist in file "<<file<<endl;
 +
  exit(0);
 +
}
 +
 +
// Get a Chain object
 +
Chain &ch = sys.getChain("A");
  
      // Regular Expression Object
+
</source>
      RegEx re;
 
  
      // Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other
 
      string regex = "GG(PPP)GGG";
 
  
      // Now do a sequence search...
+
Setup a regular expression object (RegEx) and a regular expression string to match 2 Valines followed by Isoleucine and then a Leucine.  The RegEx match gets residue (or position) indices into the parent System object.
      vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex);
+
<source lang="cpp">
 
+
 
+
// Regular Expression Object
      // Loop over each match.
+
RegEx re;
      for (uint m = 0; m < matches.size();m++){
+
       
+
// Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other
                // Loop over each residue for this match
+
string regex = "V{2}IL";
        for (uint r = matches[m].first; r < matches[m].second;r++){
+
                                   
+
// Now do a sequence search...
                        // Get the residue
+
vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex);
                        Residue &res = ch.getResidue(r);
+
 
+
                        // .. do something cool with matched residues ...
+
// Loop over each match.
              }
+
for (uint m = 0; m < matchingResidueIndices.size();m++){
      }
+
 +
  // Loop over each residue for this match
 +
  int match = 1;
 +
  for (uint r = matchingResidueIndices[m].first; r <= matchingResidueIndices[m].second;r++){
 +
 +
    // Get the residue
 +
    Residue &res = ch.getResidue(r);
 +
 +
    // .. do something cool with matched residues ...
 +
    cout << "MATCH("<<match<<"):  RESIDUE: "<<res.toString()<<endl;
 +
  }
 +
}
  
  

Revision as of 22:50, 1 April 2010

This is an example on how to select sequence motifs from Chain objects. A Chain object versus a System object is used because regular expressions can not span across chains.

This tutorial is in progress, there may be missing example files, source files or bugs in the code.

Complete source of example_regular_expressions.cpp


In MSL the program grepSequence utilizes the type of code in this tutorial to search for sequences in a list of PDB files, and then structurally align them.


To compile

% make bin/example_regular_expresssions

To run the program

Go to the main directory and run the command (note, the location of the exampleFiles subdirectory needs to be provided as an argument)

% bin/example_regularExpressions exampleFiles/example0004.pdb

Program description

Read in structure into System object, check that chain "A" exists.

      	string file = "example0004.pdb";
	file = (string)argv[1] + "/" + file;
	cout << "Create an AtomContainer and read the atoms from " << file << endl;

	System sys;
	if (!sys.readPdb(file)) {
	  // reading failed, error handling code here
	  cerr << "ERROR could not read in "<<file<<endl;
	  exit(0);
	}
 
	// Check to make sure chain A exits in sys
	if (!sys.chainExists("A")){
	  // error code here.
	  cerr << "ERROR chain A does not exist in file "<<file<<endl;
	  exit(0);
	}
 
	// Get a Chain object
	Chain &ch = sys.getChain("A");


Setup a regular expression object (RegEx) and a regular expression string to match 2 Valines followed by Isoleucine and then a Leucine. The RegEx match gets residue (or position) indices into the parent System object.

 
	// Regular Expression Object
	RegEx re;
 
	// Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other
	string regex = "V{2}IL";
 
	// Now do a sequence search...
	vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex);
 
 
	// Loop over each match.
	for (uint m = 0; m < matchingResidueIndices.size();m++){
 
	  // Loop over each residue for this match
	  int match = 1;
	  for (uint r = matchingResidueIndices[m].first; r <= matchingResidueIndices[m].second;r++){
 
	    // Get the residue
	    Residue &res = ch.getResidue(r);
 
	    // .. do something cool with matched residues ...
	    cout << "MATCH("<<match<<"):  RESIDUE: "<<res.toString()<<endl;
	  }
	}



Back to the tutorial page