Chemistry::Tutorial - PerlMol Quick Tutorial
The modules in the PerlMol toolkit are designed to simplify the handling of
molecules from Perl programs in a general and extensible way. These modules
are object-oriented; however, this tries to assume little or no knowledge of
object-oriented programming in Perl. For a general introduction about how to
use object-oriented modules, see HTML::Tree::AboutObjects.
This document shows some of the more common methods included in the PerlMol
toolkit, in a reasonable order for a quick introduction. For more details see
the perldoc pages for each module.
The following code will read a PDB file:
use Chemistry::Mol;
use Chemistry::File::PDB;
my $mol = Chemistry::Mol->read("test.pdb");
The first two lines (which only need to be used once in a given program) tell
Perl that you want to "use" the specified modules The third line
reads the file and returns a molecule object.
To read other formats such as MDL molfiles, you need to "use" the
corresponding module, such as Chemistry::File::MDLMol. Readers for several
formats are under development.
"Chemistry::Mol->read" returns a Chemistry::Mol object. An
object is a data structure of a given
class that has
methods (i.e. subroutines) associated with it. To access or modify an
object's properties, you call the methods on the object through "arrow
syntax":
my $name = $mol->name; # return the name of the molecule
$mol->name("water"); # set the name of the molecule to "water"
Note that these so-called accessor methods return the molecule object when they
are used to set a property. A consequence of that if you want, you can
"chain" several methods to set several options in one line:
$mol->name("water")->type("wet");
A Chemistry::Mol object contains essentially a list of atoms, a list of bonds,
and a few generic properties such as name, type, and id. The atoms and bonds
themselves are also objects.
To write a molecule to a file, just use the "write" method:
$mol->write("test.pdb");
Make sure you "use"d the right file I/O module. If you want to load
all the available file I/O modules, you can do it with
use Chemistry::File ':auto';
You can get an array of all the atoms by calling the atoms method without
parameters, or a specific atom by giving its index:
@all_atoms = $mol->atoms;
$atom3 = $mol->atoms(3);
Note: Atom and bond indices are counted from 1, not from 0. This
deviation from common Perl usage was made to be consistent with the way atoms
are numbered in most common file formats.
You can select atoms that match an arbitrary expression by using Perl's built-in
"grep" function:
# get all oxygen atoms within 3.0 Angstroms of atom 37
@close_oxygens = grep {
$_->symbol eq 'O'
and $_->distance($mol->atoms(37)) < 3.0
} $mol->atoms;
The "grep" function loops through all the atoms returned by
"$mol->atoms", aliasing each to $_ at each iteration, and returns
only those for which the expression in braces is true.
Using "grep" is a general way of finding atoms; however, since finding
atoms by name is common, a convenience method is available for that purpose.
$HB1 = $mol->atoms_by_name('HB1');
@H_atoms = $mol->atoms_by_name('H.*'); # name treated as a regex
Since the atom name is not generally unique, even the first example above might
match more than one atom. In that case, only the first one found is returned.
In the second case, since you are assigning to an array, all matching atoms
are returned.
Atoms are usually the most interesting objects in a molecule. Some of their main
properties are Z, symbol, and coords.
$atom->Z(8); # set atomic number to 8
$symbol = $atom->symbol;
$coords = $atom->coords;
The coordinates returned by "$atom->coords" are a Math::VectorReal
object. You can print these objects and use them to do vector algebra:
$c1 = $atom1->coords;
$c2 = $atom2->coords;
$dot_product = $c1 . $c2; # returns a scalar
$cross_product = $c1 x $c2; # returns a vector
$delta = $c2 - $c1; # returns a vector
$distance = $delta->length; # returns a scalar
($x, $y, $z) = $c1->array; # get the components of $c1
print $c1; # prints something like "[ 1.0E0 2.0E0 3.0E0 ]"
Since one is very often interested in calculating the distance between atoms,
Atom objects provide a "distance" method to save some typing:
$d = $atom1->distance($atom2);
$d2 = $atom1->distance($molecule2);
In the second case, the value obtained is the minimum distance between the atom
and the molecule. This can be useful for things such as finding the water
molecules closest to a given atom.
Atoms may also have internal coordinates, which define the position of an atom
relative to the positions of other atoms by means of a distance, an angle, and
a dihedral angle. Those coordinates can be accessed through the
$atom->internal_coords method, which uses Chemistry::InternalCoords
objects.
A Chemistry::Bond object is a list of atoms with an associated bond order. In
most cases, a bond has exactly two atoms, but we don't want to exclude
possibilities such as three-center bonds. You can get the list of atoms in a
bond by using the "atoms" method; the bond order is accessed trough
the "order" method;
@atoms_in_bond = $bond->atoms;
$bond_order = $bond->order;
The other interesting method for Bond objects is "length", which
returns the distance between the two atoms in a bond (this method requires
that the bond have two atoms).
my $bondlength = $bond->length;
In addition to these properties, Bond objects have the generic properties
described below. The most important of these, as far as bonds are concerned,
is "type".
There are three generic properties that all PerlMol objects have:
- id
- Each object must have a unique ID. In most cases you don't
have to worry about it, because it is assigned automatically unless you
specify it. You can use the "by_id" method to select an object
contained in a molecule:
$atom = $mol->by_id("a42");
In general, ids are preferable to indices because they don't change if you
delete or move atoms or other objects.
- name
- The name of the object does not have any meaning from the
point of view of the core modules, but most file types have the concept of
molecule name, and some (such as PDB) have the concept of atom names.
- type
- Again, the meaning of type is not universally defined, but
it would likely be used to specify atom types and bond orders.
Besides these, the user can specify arbitrary attributes, as discussed in the
next section.
The core PerlMol classes define very few, very generic properties for atoms and
molecules. This was chosen as a "minimum common denominator" because
every file format and program has different ideas about the names, values and
meaning of these properties. For example, some programs only allow bond orders
of 1, 2, and 3; some also have "aromatic" bonds; some use calculated
non-integer bond orders. PerlMol tries not to commit to any particular
convention, but it allows you to specify whatever attributes you want for any
object (be it a molecule, an atom, or a bond). This is done through the
"attr" method.
$mol->attr("melting point", "273.15"); # set m.p.
$color = $atom->attr("color"); # get atom color
The core modules store these values but they don't know what they mean and they
don't care about them. Attributes can have whatever name you want, and they
can be of any type. However, by convention, non-core modules that need
additional attributes should prefix their name with a
namespace,
followed by a slash. (This is done to avoid modules fighting over the same
attribute name.) For example, atoms created by the PDB reader module
(Chemistry::File::PDB) have the "pdb/residue" attribute.
$mol = Chemistry::Mol->read("test.pdb");
$atom = $mol->atoms(1234);
print $atom->attr("pdb/residue_name"); # prints "ALA123"
You can do lots of interesting thing with plain molecules. However, for some
applications you may want to extend the features of the main Chemistry::Mol
class. There are several subclasses of Chemistry::Mol available already:
- Chemistry::MacroMol
- Used for macromolecules.
- Chemistry::Pattern
- Used for substructure matching.
- Chemistry::Ring
- Used for representing rings (cycles) in molecules.
- Chemistry::Reaction
- Used for representing and applying chemical
transformations.
As an example we'll discuss macromolecules. Future versions of this tutorial may
also include a discussion about patterns and rings.
So far we have assumed that we are dealing with molecules of the Chemistry::Mol
class. However, one of the interesting things about object-oriented
programming is that classes can be extended. For dealing with macromolecules,
we have the MacroMol class, which extends the Chemistry::Mol class. This means
that in practice you can use a Chemistry::MacroMol object exactly as you would
use a Chemistry::Mol object, but with some added functionality. In fact, the
PDB reader can return Chemistry::MacroMol instead of Chemistry::Mol objects
just by changing the first example like this:
use Chemistry::MacroMol;
use Chemistry::File::PDB;
my $macromol = Chemistry::MacroMol->read("test.pdb");
Now the question is, what is the "added functionality" that MacroMol
objects have on top of the original Chemistry::Mol object?
For the purposes of this module, a macromolecule is considered to be a big
molecule where atoms are divided in
Domains. A domain is just a subset
of the atoms in the molecule; in a protein, a domain would be just a residue.
You can select domains in a molecule in a way similar to that used for atoms and
bonds, in this case through the "domains" method:
my @all_domains = $macromol->domains;
my $domain = $macromol->domains(57);
A domain is a substructure of a larger molecule. Other than having a
parent molecule, a domain is just like a molecule. In other words, the
Domain class extends the Chemistry::Mol class; it is basically a collection of
atoms and bonds.
my @atoms_in_domain = $domain->atoms;
my $atom5_in_domain = $domain->atoms(5);
If you want to get at a given atom in a given domain in a macromolecule, you can
"chain" the method calls without having to save the Domain object in
a temporary variable:
my $domain57_atom5 = $macromol->domains(57)->atoms(5);
my $res233_HA = $macromol->domains(233)->atoms_by_name('HA');
The second example is a good way of selecting an atom from a PDB file when you
know the residue number and atom name.
0.38
<
https://github.com/perlmol/Chemistry-Mol>
Chemistry::Mol, Chemistry::Atom, Chemistry::Bond, Chemistry::File,
Chemistry::MacroMol, Chemistry::Domain.
Ivan Tubert-Brohman <
[email protected]>
Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same terms
as Perl itself.