PDL::BadValues - Discussion of bad value support in PDL
Sometimes it's useful to be able to specify a certain value is 'bad' or
'missing'; for example CCDs used in astronomy produce 2D images which are not
perfect since certain areas contain invalid data due to imperfections in the
detector. Whilst PDL's powerful index routines and all the complicated
business with dataflow, slices, etc etc mean that these regions can be ignored
in processing, it's awkward to do. It would be much easier to be able to say
"$c = $x + $y" and leave all the hassle to the computer.
If you're not interested in this, then you may (rightly) be concerned with how
this affects the speed of PDL, since the overhead of checking for a bad value
at each operation can be large. Because of this, the code has been written to
be as fast as possible - particularly when operating on ndarrays which do not
contain bad values. In fact, you should notice essentially no speed difference
when working with ndarrays which do not contain bad values.
You may also ask 'well, my computer supports IEEE NaN, so I already have this'.
They are different things; a bad value signifies "leave this out of
processing", whereas NaN is the result of a mathematically-invalid
operation.
Many routines, such as "y=sin(x)", will propagate NaN's without the
user having to code differently, but routines such as "qsort", or
finding the median of an array, need to be re-coded to handle bad values. For
floating-point datatypes, "NaN" and "Inf" can be used to
flag bad values, but by default special values are used (Default bad values).
There is one default bad value for each datatype, but as of PDL 2.040, you can
have different bad values for separate ndarrays of the same type.
You can use "NaN" as the bad value for any floating-point type,
including complex.
pdl> $x = sequence(4,3);
pdl> p $x
[
[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
]
pdl> $x = $x->setbadif( $x % 3 == 2 )
pdl> p $x
[
[ 0 1 BAD 3]
[ 4 BAD 6 7]
[BAD 9 10 BAD]
]
pdl> $x *= 3
pdl> p $x
[
[ 0 3 BAD 9]
[ 12 BAD 18 21]
[BAD 27 30 BAD]
]
pdl> p $x->sum
120
"demo bad" within perldl or pdl2 gives a demonstration of some of the
things possible with bad values. These are also available on PDL's web-site,
at
http://pdl.perl.org/demos/. See PDL::Bad for useful routines for
working with bad values and
t/bad.t to see them in action.
To find out if a routine supports bad values, use the "badinfo"
command in perldl or pdl2 or the "-b" option to pdldoc.
Each ndarray contains a flag - accessible via "$pdl->badflag" - to
say whether there's any bad data present:
- •
- If false/0, which means there's no bad data here,
the code supplied by the "Code" option to "pp_def()"
is executed.
- •
- If true/1, then this says there MAY be bad
data in the ndarray, so use the code in the "BadCode" option
(assuming that the "pp_def()" for this routine has been updated
to have a BadCode key). You get all the advantages of broadcasting, as
with the "Code" option, but it will run slower since you are
going to have to handle the presence of bad values.
If you create an ndarray, it will have its bad-value flag set to 0. To change
this, use "$pdl->badflag($new_bad_status)", where $new_bad_status
can be 0 or 1. When a routine creates an ndarray, its bad-value flag will
depend on the input ndarrays: unless over-ridden (see the
"CopyBadStatusCode" option to "pp_def"), the bad-value
flag will be set true if any of the input ndarrays contain bad values. To
check that an ndarray really contains bad data, use the
"check_badflag" method.
NOTE: propagation of the badflag
If you change the badflag of an ndarray, this change is propagated to all the
children of an ndarray, so
pdl> $x = zeroes(20,30);
pdl> $y = $x->slice('0:10,0:10');
pdl> $c = $y->slice(',(2)');
pdl> print ">>c: ", $c->badflag, "\n";
>>c: 0
pdl> $x->badflag(1);
pdl> print ">>c: ", $c->badflag, "\n";
>>c: 1
This is also propagated to the parents of an ndarray, so
pdl> print ">>a: ", $x->badflag, "\n";
>>a: 1
pdl> $c->badflag(0);
pdl> print ">>a: ", $x->badflag, "\n";
>>a: 0
There's also the issue of what happens if you change the badvalue of an ndarray
- should these propagate to children/parents (yes) or whether you should only
be able to change the badvalue at the 'top' level - i.e. those ndarrays which
do not have parents.
The "orig_badvalue()" method returns the compile-time value for a
given datatype. It works on ndarrays, PDL::Type objects, and numbers - eg
$pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).
To get the current bad value, use the "badvalue()" method - it has the
same syntax as "orig_badvalue()".
To change the current bad value, supply the new number to badvalue - eg
$pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).
Note: the value is silently converted to the correct C type, and returned
- i.e. "byte->badvalue(-26)" returns 230 on my Linux machine.
Note that changes to the bad value are
NOT propagated to
previously-created ndarrays - they will still have the bad flag set, but
suddenly the elements that were bad will become 'good', but containing the old
bad value. See discussion below.
For those boolean operators in PDL::Ops, evaluation on a bad value returns the
bad value. This:
$mask = $img > $thresh;
correctly propagates bad values. This will omit any bad values, but return a bad
value if there are no good ones:
$bool = any( $img > $thresh );
As of 2.077, a bad value used as a boolean will throw an exception.
When using one of the 'projection' functions in PDL::Ufunc - such as orover -
bad values are skipped over (see the documentation of these functions for the
current handling of the case when all elements are bad).
A new flag has been added to the state of an ndarray - "PDL_BADVAL".
If unset, then the ndarray does not contain bad values, and so all the support
code can be ignored. If set, it does not guarantee that bad values are
present, just that they should be checked for.
The "pdl_trans" structure has been extended to include an integer
value, "bvalflag", which acts as a switch to tell the code whether
to handle bad values or not. This value is set if any of the input ndarrays
have their "PDL_BADVAL" flag set (although this code can be replaced
by setting "FindBadStateCode" in pp_def).
The default bad values are now stored in a structure within the Core PDL
structure - "PDL.bvals" (eg
Basic/Core/pdlcore.h.PL); see
also "typedef badvals" in
Basic/Core/pdl.h.PL and the BOOT
code of
Basic/Core/Core.xs.PL where the values are initialised to
(hopefully) sensible values. See "badvalue" in PDL::Bad and
"orig_badvalue" in PDL::Bad for read/write routines to the values.
The default/original bad values are set to the C type's maximum (unsigned
integers) or the minimum (floating-point and signed integers).
See "BadCode" in PDL::PP and "HandleBad" in PDL::PP.
If you have a routine that you want to be able to use as in-place, look at the
routines in
bad.pd (or
ops.pd) which use the
"in-place" option to see how the bad flag is propagated to children
using the "xxxBadStatusCode" options. I decided not to automate this
as rules would be a little complex, since not every in-place op will need to
propagate the badflag (eg unary functions).
This all means that you can change
Code => '$a() = $b() + $c();'
to
BadCode => 'if ( $ISBAD(b()) || $ISBAD(c()) ) {
$SETBAD(a());
} else {
$a() = $b() + $c();
}'
leaving Code as it is. PP::PDLCode will then create code something like
if ( __trans->bvalflag ) {
broadcastloop over BadCode
} else {
broadcastloop over Code
}
One of the strengths of PDL is its on-line documentation. The aim is to use this
system to provide information on how/if a routine supports bad values: in many
cases "pp_def()" contains all the information anyway, so the
function-writer doesn't need to do anything at all! For the cases when this is
not sufficient, there's the "BadDoc" option. For code written at the
Perl level - i.e. in a .pm file - use the "=for bad" pod directive.
This information will be available via man/pod2man/html documentation. It's also
accessible from the "perldl" or "pdl2" shells - using the
"badinfo" command - and the "pdldoc" shell command - using
the "-b" option.
Copyright (C) Doug Burke (
[email protected]), 2000, 2006.
The per-ndarray bad value support is by Heiko Klein (2006).