PDL::QuickStart - Quick introduction to PDL features.
A brief summary of the main PDL features and how to use them.
Perl is an extremely good and versatile scripting language, well suited to
beginners and allows rapid prototyping. However until recently it did not
support data structures which allowed it to do fast number crunching.
However with the development of Perl v5, Perl acquired 'Objects'. To put it
simply users can define their own special data types, and write custom
routines to manipulate them either in low level languages (C and Fortran) or
in Perl itself.
This has been fully exploited by the PerlDL developers. The 'PDL' module is a
complete Object-Oriented extension to Perl (although you don't have to know
what an object is to use it) which allows large N-dimensional data sets, such
as large images, spectra, time series, etc to be stored
efficiently and
manipulated
en masse. For example with the PDL module we can write the
Perl code "$x = $y + $z", where $y and $z are large datasets (e.g.
2048x2048 images), and get the result in only a fraction of a second.
PDL variables (or 'ndarrays' as they have come to be known) support a wide range
of fundamental data types - arrays can be bytes, short integers (signed or
unsigned), long integers, floats or double precision floats. And because of
the Object-Oriented nature of PDL new customised datatypes can be derived from
them.
As well as the PDL modules, that can be used by normal Perl programs, PerlDL
comes with a command line Perl shell, called 'perldl', which supports command
line editing. In combination with the various PDL graphics modules this allows
data to be easily played with and visualised.
PDL contains extensive documentation, available both within the
perldl or
pdl2 shells and from the command line, using the "pdldoc"
program. For further information try either of:
pdl> help help
$ pdldoc
HTML copies of the documentation should also be available. To find their
location, try the following:
pdl> foreach ( map{"$_/PDL/HtmlDocs"}@INC ) { p "$_\n" if -d $_ }
The fundamental Perl data structures are scalar variables, e.g. $x, which can
hold numbers or strings, lists or arrays of scalars, e.g. @x, and associative
arrays/hashes of scalars, e.g. %x.
Perl v5 introduces to Perl data structures and objects. A simple scalar variable
$x now be a user-defined data type or full blown object (it actually holds a
reference (a smart "pointer") to this but that is not relevant for
ordinary use of perlDL)
The fundamental idea behind perlDL is to allow $x to hold a whole 1D spectrum,
or a 2D image, a 3D data cube, and so on up to large N-dimensional data sets.
These can be manipulated all at once, e.g. "$x = $y + 2" does a
vector operation on each value in the spectrum/image/etc.
You may well ask: "Why not just store a spectrum as a simple Perl @x style
list with each pixel being a list item?" The two key answers to this are
memory and
speed. Because we know our spectrum consists of pure
numbers we can compactly store them in a single block of memory corresponding
to a C style numeric array. This takes up a LOT less memory than the
equivalent Perl list. It is then easy to pass this block of memory to a fast
addition routine, or to any other C function which deals with arrays. As a
result perlDL is very fast --- for example one can multiply a 2048*2048 image
in exactly the same time as it would take in C or FORTRAN (0.1 sec on my
SPARC). A further advantage of this is that for simple operations (e.g.
"$x += 2") one can manipulate the whole array without caring about
its dimensionality.
I find when using perlDL it is most useful to think of standard Perl @x
variables as "lists" of generic "things" and PDL variables
like $x as "arrays" which can be contained in lists or hashes. Quite
often in my perlDL scripts I have @x contain a list of spectra, or a list of
images (or even a mix!). Or perhaps one could have a hash (e.g. %x) of
images... the only limit is memory!
perlDL variables support a range of data types - arrays can be bytes, short
integers (signed or unsigned), long integers, floats or double precision
floats.
PerlDL is loaded into your Perl script using this command:
use PDL; # in Perl scripts: use the standard perlDL modules
There are also a lot of extension modules, e.g. PDL::Graphics::TriD. Most of
these (but not all as sometimes it is not appropriate) follow a standard
convention. If you say:
use PDL::Graphics::TriD;
You import everything in a standard list from the module. Sometimes you might
want to import nothing (e.g. if you want to use OO syntax all the time and
save the import tax). For these you say:
use PDL::Graphics::TriD qw();
And the empty "qw()" quotes are recognised as meaning 'nothing'. You
can also specify a list of functions to import in the normal Perl way.
There is also an interactive shell, "perldl" or "pdl2", see
perldl or pdl2 for details.
Here are some ways of creating a PDL variable:
$x = pdl [1..10]; # 1D array
$x = pdl (1,2,3,4); # Ditto
$x = pdl '[1 2 3 4]'; # Ditto
$y = pdl [[1,2,3],[4,5,6]]; # 2D 3x2 array
$y = pdl '[1 2 3; 4 5 6]'; # Ditto
$y = pdl q[1,2,3; 4,5,6]; # Ditto
$y = pdl <<NEWPDL # Ditto
[1 2 3]
[4 5 6]
NEWPDL
$c = pdl q[1 -2]; # 2-element ndarray containing 1 and -2
$c = pdl q[1 - 2]; # 2-element ndarray containing 1 and -2
$y = pdl 42 # 0-dimensional scalar
$c = pdl $x; # Make a new copy
$d = byte [1..10]; # See "Type conversion"
$e = zeroes(3,2,4); # 3x2x4 zero-filled array
$c = rfits $file; # Read FITS file
@x = ( pdl(42), zeroes(3,2,4), rfits($file) ); # Is a LIST of PDL variables!
The
pdl() function is used to initialise a PDL variable from a scalar,
list, list reference, another PDL variable, or a properly formatted string.
In addition all PDL functions automatically convert normal Perl scalars to PDL
variables on-the-fly.
(also see "Type Conversion" and "Input/Output" sections
below)
$x = $y + 2; $x++; $x = $y / $c; # Etc.
$c=sqrt($x); $d = log10($y+100); # Etc
$e = $x>42; # Vector conditional
$e = 42*($x>42) + $x*($x<=42); # Cap top
$y = $x->log10 unless any ($x <= 0); # avoid floating point error
$x = $x / ( max($x) - min($x) );
$f = where($x, $x > 10); # where returns an ndarray of elements for
# which the condition is true
print $x; # $x in string context prints it in a N-dimensional format
(and other Perl operators/functions)
When using ndarrays in conditional expressions (i.e. "if",
"unless" and "while" constructs) only ndarrays with
exactly one element are allowed, e.g.
$x = pdl (1,0,0,1);
print "is set" if $x->index(2);
Note that the boolean operators return in general multi-element ndarrays.
Therefore, the following will raise an error
print "is ok" if $x > 3;
since "$x > 3" is an ndarray with 4 elements. Rather use all or any
to test if all or any of the elements fulfill the condition:
print "some are > 3" if any $x>3;
print "can't take logarithm" unless all $x>0;
There are also many predefined functions, which are described on other man
pages. Check PDL::Index.
'x' is hijacked as the matrix multiplication operator. e.g. "$c = $x x
$y";
perlDL is row-major not column major so this is actually "c(i,j) = sum_k
x(k,j) y(i,k)" - but when matrices are printed the results will look
right. Just remember the indices are reversed. e.g.:
$x = [ $y = [
[ 1 2 3 0] [1 1]
[ 1 -1 2 7] [0 2]
[ 1 0 0 1] [0 2]
] [1 1]
]
gives $c = [
[ 1 11]
[ 8 10]
[ 2 2]
]
Note:
transpose() does what it says and is a convenient way to turn row
vectors into column vectors.
sub dotproduct {
my ($x,$y) = @_;
return sum($x*$y) ;
}
1;
If put in file dotproduct.pdl would be autoloaded if you are using
PDL::AutoLoader (see below).
Of course, this function is already available as the inner function, see
PDL::Primitive.
Default for
pdl() is double. Conversions are:
$x = float($y);
$c = long($d); # "long" is generally a 4 byte int
$d = byte($x);
Also
double(),
short(),
ushort(),
indx().
NOTE: The indx() routine is a special integer type that
is the correct size for a PDL index value (dimension size,
index, or offest) which can be either a 32bit (long) or
64bit (longlong) quantity depending on whether the perl
is built with 32bit or 64bit support.
These routines also automatically convert Perl lists to allow the convenient
shorthand:
$x = byte [[1..10],[1..10]]; # Create 2D byte array
$x = float [1..1000]; # Create 1D float array
etc.
Automatically expands array in N-dimensional format:
print $x;
$y = "Answer is = $x ";
PDL has very powerful multidimensional slicing and sectioning operators; see the
PDL::Slices(3) man page for details; we'll describe the most important
one here.
PDL shows its Perl/C heritage in that arrays are zero-offset. Thus a 100x100
image has indices "0..99,0..99". (The convention is that the
center of pixel (0,0) is at coordinate (0.0,0.0). All PDL graphics
functions conform to this definition and hide away the unit offsets of, for
example, the PGPLOT FORTRAN library.
Following the usual convention coordinate (0,0) is displayed at the bottom left
when displaying an image. It appears at the top left when using
""print $x"" etc.
Simple sectioning uses a syntax extension to Perl, PDL::NiceSlice, that allows
you to specify subranges via a null-method modifier to a PDL:
$g = $f->($x1:$x2,$y1:$y2,($z1)); # Take subsection
Here, $f is a 3-dimensional variable, and $g gets a planar cutout that is
defined by the limits $x1, $x2, $y1, $y2, at the location $z1. The parenthesis
around $z1 cause the trivial index to be omitted -- otherwise $g would be
three-dimensional with a third dimension of order 1.
You can put PDL slices on either side of the element-wise assignment operator
".=", like so:
# Set part of $bigimage to values from $smallimage
$bigimage->($xa:$xb,$ya:$yb) .= $smallimage;
Some other miscellany:
$c = nelem($x); # Number of pixels
$val = at($object, $x,$y,$z...) # Pixel value at position, as a Perl scalar
$val = $object->at($x,$y,$z...) # equivalent (method syntax OK)
$y = xvals($x); # Fill array with X-coord values (also yvals(), zvals(),
# axisvals($x,$axis) and rvals() for radial distance
# from centre).
The "PDL::IO" modules implement several useful IO format functions. It
would be too much to give examples of each, but you can find a nice overview
at PDL::IO. Here is a sample of some of the supported IO formats in PDL.
- PDL::IO::Misc
- Ascii, FITS and FIGARO/NDF IO routines.
- PDL::IO::FastRaw
- Using the raw data types of your machine, an unportable but
blindingly fast IO format. Also supports memory mapping to conserve memory
as well as get more speed.
- PDL::IO::FlexRaw
- General raw data formats. Like FastRaw, only better.
- PDL::IO::Browser
- A Curses browser for arrays.
- PDL::IO::Pnm
- Portaple bitmap and pixmap support.
- PDL::IO::Pic
- Using the previous module and netpbm, makes it possible to
easily write GIF, jpeg and whatever with simple commands.
The philosophy behind perlDL is to make it work with a variety of existing
graphics libraries since no single package will satisfy all needs and all
people and this allows one to work with packages one already knows and likes.
Obviously there will be some overlaps in functionality and some lack of
consistency and uniformity. However this allows PDL to keep up with a rapidly
developing field - the latest PDL modules provide interfaces to OpenGL and
VRML graphics!
- PDL::Graphics::PGPLOT
- PGPLOT provides a simple library for line graphics and
image display.
There is an easy interface to this in the internal module
PDL::Graphics::PGPLOT, which calls routines in the separately available
PGPLOT top-level module.
- PDL::Graphics::PLplot
- PLplot provides a simple library for creating graphics with
multiple output drivers, including a direct-to-ndarray driver.
This module provides both high-level and low-level functionality built on
PLplot. The low-level commands are pretty much direct bindings to PLplot's
C interface. Read more at PDL::Graphics::PLplot.
- PDL::Graphics::IIS
- Many astronomers like to use SAOimage and Ximtool (or there
derivations/clones). These are useful free widgets for inspection and
visualisation of images. (They are not provided with perlDL but can easily
be obtained from their official sites off the Net.)
The PDL::Graphics::IIS package provides allows one to display images in
these ("IIS" is the name of an ancient item of image display
hardware whose protocols these tools conform to.)
- PDL::Graphics::TriD
- See PDL::Graphics::TriD, this is a collection of 3D
routines for OpenGL and (soon) VRML and other 3D formats which allow 3D
point, line, and surface plots from PDL.
See PDL::AutoLoader. This allows one to autoload functions on demand, in a way
perhaps familiar to users of MatLab.
One can also write PDL extensions as normal Perl modules.
The Perl script "pdl2" (or "perldl") provides a simple
command line interface to PDL. If the latest Readlines/ReadKey modules have
been installed "pdl2" detects this and enables command line recall
and editing. See the man page for details.
e.g.:
% perldl
perlDL shell v1.354
PDL comes with ABSOLUTELY NO WARRANTY. For details, see the file
'COPYING' in the PDL distribution. This is free software and you
are welcome to redistribute it under certain conditions, see
the same file for details.
ReadLines, NiceSlice, MultiLines enabled
Reading PDL/default.perldlrc...
Found docs database /home/pdl/dev/lib/perl5/site_perl/PDL/pdldoc.db
Type 'help' for online help
Type 'demo' for online demos
Loaded PDL v2.4.9_003 (supports bad values)
pdl> $x = rfits 'm51.fits'
Reading IMAGE data...
BITPIX = 32 size = 147456 pixels
Reading 589824 bytes
BSCALE = && BZERO =
pdl> use PDL::Graphics::PGPLOT;
pdl> imag $x
Displaying 384 x 384 image from 40 to 761, using 84 colors (16-99)...
You can also run it from the Perl debugger ("perl -MPDL -d -e 1") if
you want.
Miscellaneous shell features:
- p
- The shell aliases "p" to be a convenient short
form of "print", e.g.
pdl> p ones 5,3
[
[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
]
- Initialization
- The files "~/.perldlrc" and
"local.perldlrc" (in the current directory) are sourced if
found. This allows the user to have global and local PDL code for
startup.
- Help
- Type 'help'! One can search the PDL documentation, and look
up documentation on any function.
- Escape
- Any line starting with the "#" character is
treated as a shell escape. This character is configurable by setting the
Perl variable $PERLDL_ESCAPE. This could, for example, be set in
"~/.perldlrc".
The following builtin Perl operators and functions have been overloaded to work
on PDL variables:
+ - * / > < >= <= << >> & | ^ == != <=> ** % ! ~
sin log abs atan2 sqrt cos exp
[All the unary functions (sin etc.) may be used with
inplace() - see
"Memory" below.]
PDL operations are available as functions and methods. Thus one can derive new
types of object, to represent custom data classes.
By using overloading one can make mathematical operators do whatever you please,
and PDL has some built-in tricks which allow existing PDL functions to work
unchanged, even if the underlying data representation is vastly changed! See
PDL::Objects
Messing around with really huge data arrays may require some care. perlDL
provides many facilities to let you perform operations on big arrays without
generating extra copies though this does require a bit more thought and care
from the programmer.
NOTE: On some most systems it is better to configure Perl (during the build
options) to use the system "malloc()" function rather than Perl's
built-in one. This is because Perl's one is optimised for speed rather than
consumption of virtual memory - this can result in a factor of two improvement
in the amount of memory storage you can use. The Perl malloc in 5.004 and
later does have a number of compile-time options you can use to tune the
behaviour.
- Simple arithmetic
- If $x is a big image (e.g. occupying 10MB) then the command
$x = $x + 1;
eats up another 10MB of memory. This is because the expression
"$x+1" creates a temporary copy of $x to hold the result, then
$x is assigned a reference to that. After this, the original $x is
destroyed so there is no permanent memory waste. But on a small
machine, the growth in the memory footprint can be considerable. It is
obviously done this way so "$c=$x+1" works as expected.
Also if one says:
$y = $x; # $y and $x now point to same data
$x = $x + 1;
Then $y and $x end up being different, as one naively expects, because a new
reference is created and $x is assigned to it.
However if $x was a huge memory hog (e.g. a 3D volume) creating a copy of it
may not be a good thing. One can avoid this memory overhead in the above
example by saying:
$x++;
The operations "++,+=,--,-=", etc. all call a special
"in-place" version of the arithmetic subroutine. This means no
more memory is needed - the downside of this is that if "$y=$x"
then $y is also incremented. To force a copy explicitly:
$y = pdl $x; # Real copy
or, alternatively, perhaps better style:
$y = $x->copy;
- Functions
- Most functions, e.g. "log()", return a result
which is a transformation of their argument. This makes for good
programming practice. However many operations can be done
"in-place" and this may be required when large arrays are in use
and memory is at a premium. For these circumstances the operator
inplace() is provided which prevents the extra copy and allows the
argument to be modified. e.g.:
$x = log($array); # $array unaffected
log( inplace($bigarray) ); # $bigarray changed in situ
WARNINGS:
- 1.
- The usual caveats about duplicate references apply.
- 2.
- Obviously when used with some functions which can not be
applied in situ (e.g. "convolve()") unexpected effects may
occur! We try to indicate "inplace()"-safe functions in the
documentation.
- 3.
- Type conversions, such as"float()", may cause
hidden copying.
If you have written a simple function and you don't want it to blow up in your
face if you pass it a simple number rather than a PDL variable. Simply call
the function
topdl() first to make it safe. e.g.:
sub myfiddle { my $pdl = topdl(shift); $pdl->fiddle_foo(...); ... }
"topdl()" does NOT perform a copy if a pdl variable is passed - it
just falls through - which is obviously the desired behaviour. The routine is
not of course necessary in normal user defined functions which do not care
about internals.
Copyright (C) Karl Glazebrook (
[email protected]), Tuomas J. Lukka,
(
[email protected]) and Christian Soeller (
[email protected])
1997. All rights reserved. There is no warranty. You are allowed to copy this
on the same terms as Perl itself.