OtherPars
sectionwarn
and barf
in PP Code
PDL::PP - Generate PDL routines from concise descriptions
e.g.
pp_def( 'sumover', Pars => 'a(n); [o]b();', Code => q{ double tmp=0; loop(n) %{ tmp += $a(); %} $b() = tmp; }, );
pp_done();
Why do we need PP? Several reasons: firstly, we want to be able to
generate subroutine code for each of the PDL datatypes (PDL_Byte,
PDL_Short,. etc). AUTOMATICALLY. Secondly, when referring to slices
of PDL arrays in Perl (e.g. $a->slice('0:10:2,:')
or other things such
as transposes) it is nice to be able to do this transparently and to
be able to do this 'in-place' - i.e, not to have to make a memory copy
of the section. PP handles all the necessary element and offset
arithmetic for you. There are also the notions of threading (repeated
calling of the same routine for multiple slices, see the PDL::Indexing manpage)
and dataflow (see the PDL::Dataflow manpage) which use of PP allows.
In much of what follows we will assume familiarity of the reader with the concepts of implicit and explicit threading and index manipulations within PDL. If you have not yet heard of these concepts or are not very comfortable with them it is time to check the PDL::Indexing manpage.
As you may appreciate from its name PDL::PP is a Pre-Processor, i.e. it expands code via substitutions to make real C-code. Technically, the output is XS code (see perlxs) but that is very close to C.
So how do you use PP? Well for the most part you just write ordinary C code except for special PP constructs which take the form:
$something(something else)
or:
PPfunction %{ <stuff> %}
The most important PP construct is the form $array()
. Consider the very
simple PP function to sum the elements of a 1D vector (in fact this is
very similar to the actual code used by 'sumover'):
pp_def('sumit', Pars => 'a(n); [o]b();', Code => q{ double tmp; tmp = 0; loop(n) %{ tmp += $a(); %} $b() = tmp; } );
What's going on? The Pars =>
line is very important for PP - it
specifies all the arguments and their dimensionality. We call
this the signature of the PP function (compare also the explanations in
the PDL::Indexing manpage). In this case the
routine takes a 1-D function as input and returns a 0-D scalar as
output. The $a()
PP construct is used to access elements of the array
a(n) for you - PP fills in all the required C code.
You will notice that we are using the q{}
single-quote operator. This is
not an accident. You generally want to use single quotes to denote your
PP Code sections. PDL::PP uses $var()
for its parsing and if you don't
use single quotes, Perl will try to interpolate $var()
. Also, using the
single quote q
operator with curly braces makes it look like you are
creating a code block, which is What You Mean. (Perl is smart enough to look
for nested curly braces and not close the quote until it finds the matching
curly brace, so it's safe to have nested blocks.) Under other circumstances,
such as when you're stitching together a Code block using string
concatenations, it's often easiest to use real single quotes as
Code => 'something'.$interpolatable.'somethingelse;'
In the simple case here where all elements are accessed the PP construct
loop(n) %{ ... %}
is used to loop over all elements in dimension n
.
Note this feature of PP: ALL DIMENSIONS ARE SPECIFIED BY NAME.
This is made clearer if we avoid the PP loop()
construct
and write the loop explicitly using conventional C:
pp_def('sumit', Pars => 'a(n); [o]b();', Code => q{ int i,n_size; double tmp; n_size = $SIZE(n); tmp = 0; for(i=0; i<n_size; i++) { tmp += $a(n=>i); } $b() = tmp; }, );
which does the same as before, but is more long-winded.
You can see to get element i
of a() we say $a(n=>i)
- we are
specifying the dimension by name n
. In 2D we might say:
Pars=>'a(m,n);', ... tmp += $a(m=>i,n=>j); ...
The syntax m=>i
borrows from Perl hashes, which are in fact
used in the implementation of PP. One could also say
$a(n=>j,m=>i)
as order is not important.
You can also see in the above example the use of another PP
construct - $SIZE(n)
to get the length of the dimension n
.
It should, however, be noted that you shouldn't write an explicit C-loop
when you could have used the PP loop
construct since PDL::PP checks
automatically the loop limits for you, usage of loop
makes the code more
concise, etc. But there are certainly situations where you need explicit
control of the loop and now you know how to do it ;).
To revisit 'Why PP?' - the above code for sumit()
will be
generated for each data-type. It will operate on slices
of arrays 'in-place'. It will thread automatically - e.g. if
a 2D array is given it will be called repeatedly for each
1D row (again check the PDL::Indexing manpage for the details of threading).
And then b() will be a 1D array of sums of each row.
We could call it with $a->xchg(0,1) to sum the columns instead.
And Dataflow tracing etc. will be available.
You can see PP saves the programmer from writing a lot of needlessly repetitive C-code -- in our opinion this is one of the best features of PDL making writing new C subroutines for PDL an amazingly concise exercise. A second reason is the ability to make PP expand your concise code definitions into different C code based on the needs of the computer architecture in question. Imagine for example you are lucky to have a supercomputer at your hands; in that case you want PDL::PP certainly to generate code that takes advantage of the vectorising/parallel computing features of your machine (this a project for the future). In any case, the bottom line is that your unchanged code should still expand to working XS code even if the internals of PDL changed.
Also, because you are generating the code in an actual Perl script, there are many fun things that you can do. Let's say that you need to write both sumit (as above) and multit. With a little bit of creativity, we can do
for({Name => 'sumit', Init => '0', Op => '+='}, {Name => 'multit', Init => '1', Op => '*='}) { pp_def($_->{Name}, Pars => 'a(n); [o]b();', Code => ' double tmp; tmp = '.$_->{Init}.'; loop(n) %{ tmp '.$_->{Op}.' $a(); %} $b() = tmp; '); }
which defines both the functions easily. Now, if you later need to change the signature or dimensionality or whatever, you only need to change one place in your code. Yeah, sure, your editor does have 'cut and paste' and 'search and replace' but it's still less bothersome and definitely more difficult to forget just one place and have strange bugs creep in. Also, adding 'orit' (bitwise or) later is a one-liner.
And remember, you really have Perl's full abilities with you - you can very easily read any input file and make routines from the information in that file. For simple cases like the above, the author (Tjl) currently favors the hash syntax like the above - it's not too much more characters than the corresponding array syntax but much easier to understand and change.
We should mention here also the ability to get the pointer to the
beginning of the data in memory - a prerequisite for interfacing
PDL to some libraries. This is handled with the $P(var)
directive,
see below.
When starting work on a new pp_def'ined function, if you make a mistake, you
will usually find a pile of compiler errors indicating line numbers in the
generated XS file. If you know how to read XS files (or if you want to learn
the hard way), you could open the generated XS file and search for the line
number with the error. However, a recent addition to PDL::PP helps report
the correct line number of your errors: pp_line_numbers
. Working with the
original summit example, if you had a mis-spelling of tmp in your code, you
could change the (erroneos) code to something like this and the compiler
would give you much more useful information:
pp_def('sumit', Pars => 'a(n); [o]b();', Code => pp_line_numbers(__LINE__, q{ double tmp; tmp = 0; loop(n) %{ tmp += $a(); %} $b() = rmp; }) );
For the above situation, my compiler tells me:
... test.pd:15: error: ‘rmp’ undeclared (first use in this function) ...
In my example script (called test.pd), line 15 is exactly the line at which
I made my typo: rmp
instead of tmp
.
So, after this quick overview of the general flavour of programming PDL routines using PDL::PP let's summarise in which circumstances you should actually use this preprocessor/precompiler. You should use PDL::PP if you want to
interface PDL to some external library
write some algorithm that would be slow if coded in Perl (this is not as often as you think; take a look at threading and dataflow first).
be a PDL developer (and even then it's not obligatory)
Because of its architecture, PDL::PP can be both flexible and easy to use
on the one hand, yet exuberantly complicated at the same time. Currently,
part of the problem is that error messages are not very informative and if
something goes wrong, you'd better know what you are doing and be able to
hack your way through the internals (or be able to figure out by trial and
error what is wrong with your args to pp_def
). Although work is being
done to produce better warnings, do not be afraid to send your questions to
the mailing list if you run into trouble.
Now that you have some idea how to use pp_def
to define new PDL functions
it is time to explain the general syntax of pp_def
.
pp_def
takes as arguments first the name of the function
you are defining and then a hash list that can contain various keys.
Based on these keys PP generates XS code and a .pm file. The function
pp_done
(see example in the SYNOPSIS) is used to tell PDL::PP that there
are no more definitions in this file and it is time to generate the .xs and
.pm file.
As a consequence, there may be several pp_def()
calls inside a file (by
convention files with PP code have the extension .pd or .pp) but generally
only one pp_done()
.
There are two main different types of usage of pp_def()
,
the 'data operation' and 'slice operation' prototypes.
The 'data operation' is used to take some data, mangle it and
output some other data; this includes for example the '+' operation,
matrix inverse, sumover etc and all the examples we have talked about
in this document so far. Implicit and explicit threading and the creation
of the result are taken care of automatically in those operations. You
can even do dataflow with sumit
, sumover
, etc
(don't be dismayed if you don't understand the concept of dataflow
in PDL very well yet; it is still very much experimental).
The 'slice operation' is a different kind of operation: in a slice operation, you are not changing any data, you are defining correspondences between different elements of two piddles (examples include the index manipulation/slicing function definitions in the file slices.pd that is part of the PDL distribution; but beware, this is not introductory level stuff).
If PDL was compiled with support for bad values (i.e. WITH_BADVAL => 1
),
then additional keys are required for pp_def
, as explained below.
If you are just interested in communicating with some external library (for example some linear algebra/matrix library), you'll usually want the 'data operation' so we are going to discuss that first.
In the data operation, you must know what dimensions of data you need. First, an example with scalars:
pp_def('add', Pars => 'a(); b(); [o]c();', Code => '$c() = $a() + $b();' );
That looks a little strange but let's dissect it. The first line is easy: we're defining a routine with the name 'add'. The second line simply declares our parameters and the parentheses mean that they are scalars. We call the string that defines our parameters and their dimensionality the signature of that function. For its relevance with regard to threading and index manipulations check the the PDL::Indexing manpage man page.
The third line is the actual operation. You need to use the dollar signs and parentheses to refer to your parameters (this will probably change at some point in the future, once a good syntax is found).
These lines are all that is necessary to actually define the function for PDL (well, actually it isn't; you additionally need to write a Makefile.PL (see below) and build the module (something like 'perl Makefile.PL; make'); but let's ignore that for the moment). So now you can do
use MyModule; $a = pdl 2,3,4; $b = pdl 5;
$c = add($a,$b); # or add($a,$b,($c=null)); # Alternative form, useful if $c has been # preset to something big, not useful here.
and have threading work correctly (the result is $c == [7 8 9]).
Seeing the above example code you will most probably ask: what is this
strange $c=null
syntax in the second call to our new add
function? If
you take another look at the definition of add
you will notice that
the third argument c
is flagged with the qualifier [o]
which
tells PDL::PP that this is an output argument. So the above call to
add means 'create a new $c from scratch with correct dimensions' -
null
is a special token for 'empty piddle' (you might ask why we
haven't used the value undef
to flag this instead of the PDL
specific null
; we are currently thinking about it ;).
[This should be explained in some other section of the manual as well!!] The reason for having this syntax as an alternative is that if you have really huge piddles, you can do
$c = PDL->null; for(some long loop) { # munge a,b add($a,$b,$c); # munge c, put something back to a,b }
and avoid allocating and deallocating $c each time. It is allocated
once at the first add()
and thereafter the memory stays until $c is
destroyed.
If you just say
$c = add($a,$b);
the code generated by PP will automatically fill in $c=null
and return
the result. If you want to learn more
about the reasons why PDL::PP supports this style where output arguments
are given as last arguments check the
the PDL::Indexing manpage man page.
[o]
is not the only qualifier a pdl argument can have in the signature.
Another important qualifier is the [t]
option which flags a pdl as
temporary. What does that mean? You tell PDL::PP that this pdl is only
used for temporary results in the course of the calculation and you are
not interested in its value after the computation has been completed. But
why should PDL::PP want to know about this in the first place? The reason
is closely related to the concepts of pdl auto creation (you heard
about that above) and implicit threading. If you use implicit threading
the dimensionality of automatically created pdls is actually larger than
that specified in the signature. With [o]
flagged pdls will be created
so that they have the additional dimensions as required by the number
of implicit thread dimensions. When creating a temporary pdl, however,
it will always only be made big enough so that it can hold the result
for one iteration in a thread loop, i.e. as large as required by the signature.
So less memory is wasted when you flag a pdl as temporary. Secondly, you
can use output auto creation with temporary pdls even when you are using
explicit threading which is forbidden for normal output pdls flagged with
[o]
(see the PDL::Indexing manpage).
Here is an example where we use the [t] qualifier. We define the function
callf
that calls a C routine f
which needs a temporary array of the
same size and type as the array a
(sorry about the forward reference
for $P
; it's a pointer access, see below) :
pp_def('callf', Pars => 'a(n); [t] tmp(n); [o] b()', Code => 'int ns = $SIZE(n); f($P(a),$P(b),$P(tmp),ns); ' );
Now we have just talked about dimensions of pdls and the signature. How are they related? Let's say that we want to add a scalar + the index number to a vector:
pp_def('add2', Pars => 'a(n); b(); [o]c(n);', Code => 'loop(n) %{ $c() = $a() + $b() + n; %}' );
There are several points to notice here: first, the Pars
argument now contains the n arguments to show that we have a single
dimensions in a and c. It is important to note that dimensions
are actual entities that are accessed by name so this declares
a and c to have the same first dimensions. In most PP definitions
the size of named dimensions will be set from the respective dimensions
of non-output pdls (those with no [o]
flag) but sometimes you might
want to set the size of a named dimension explicitly through an integer
parameter. See below in the description of the OtherPars
section how
that works.
Suppose you want an output piddle to be created automatically and you know that on every call its dimension will have the same size (say 9) regardless of the dimensions of the input piddles. In this case you use the following syntax in the Pars section to specify the size of the dimension:
' [o] y(n=9); '
As expected, extra dimensions required by threading will be
created if necessary. If you need to assign a named dimension according
to a more complicated formula (than a constant) you must use the
RedoDimsCode
key described below.
The signature also determines the type conversions that will be performed when a PP function is invoked. So what happens when we invoke one of our previously defined functions with pdls of different type, e.g.
add2($a,$b,($ret=null));
where $a is of type PDL_Float
and $b of type PDL_Short
? With the signature
as shown in the definition of add2
above the datatype of the operation
(as determined at runtime) is that of the pdl with the 'highest' type
(sequence is byte < short < ushort < long < float < double). In the add2
example the datatype of the operation is float ($a has that datatype). All
pdl arguments are then type converted to that datatype (they are not
converted inplace but a copy with the right type is created if a pdl argument
doesn't have the type of the operation).
Null pdls don't contribute a type
in the determination of the type of the operation.
However, they will be
created with the datatype of the operation; here, for example, $ret will be
of type float. You should be aware of these rules when calling PP functions
with pdls of different types to take the additional storage and runtime
requirements into account.
These type conversions are correct for most functions you normally define
with pp_def
. However, there are certain cases where slightly modified
type conversion behaviour is desired. For these cases additional qualifiers
in the signature can be used to specify the desired properties with regard
to type conversion. These qualifiers can be combined with those we have
encountered already (the creation qualifiers [o]
and [t]
). Let's
go through the list of qualifiers that change type conversion behaviour.
The most important is the int
qualifier which comes in handy when a
pdl argument represents indices into another pdl. Let's take a look at
an example from PDL::Ufunc
:
pp_def('maximum_ind', Pars => 'a(n); int [o] b()', Code => '$GENERIC() cur; int curind; loop(n) %{ if (!n || $a() > cur) {cur = $a(); curind = n;} %} $b() = curind;', );
The function maximum_ind
finds the index of the largest element of
a vector. If you look at the signature you notice that the output
argument b
has been declared with the additional int
qualifier.
This has the following consequences for type conversions: regardless of
the type of the input pdl a
the output pdl b
will be of type
PDL_Long
which makes sense since b
will represent an index into
a
. Furthermore, if you call the function with an existing output
pdl b
its type will not influence the datatype of the operation (see
above). Hence, even if a
is of a smaller type than b
it will not
be converted to match the type of b
but stays untouched, which saves
memory and CPU cycles and is the right thing to do when b
represents
indices. Also note that you can use the 'int' qualifier together with
other qualifiers (the [o]
and [t]
qualifiers). Order is significant --
type qualifiers precede creation qualifiers ([o]
and [t]
).
The above example also demonstrates typical usage of the $GENERIC()
macro. It expands to the current type in a so called generic
loop. What is a generic loop? As you already heard a PP function has a
runtime datatype as determined by the type of the pdl arguments it has
been invoked with. The PP generated XS code for this function
therefore contains a switch like switch (type) {case PDL_Byte: ... case
PDL_Double: ...}
that selects a case based on the runtime
datatype of the function (it's called a type ``loop''
because there is a loop in PP code that generates the cases).
In any case your code is inserted once for each PDL type
into this switch statement. The $GENERIC()
macro just expands to
the respective type in each copy of your parsed code in this switch
statement, e.g., in the case PDL_Byte
section cur
will expand to
PDL_Byte
and so on for the other case statements. I guess you
realise that this is a useful macro to hold values of pdls in some
code.
There are a couple of other qualifiers with similar effects as int
.
For your convenience there are the float
and double
qualifiers
with analogous consequences on type conversions as int
. Let's
assume you have a very large array for which you want to compute
row and column sums with an equivalent of the sumover
function.
However, with the normal definition of sumover
you might run
into problems when your data is, e.g. of type short. A call like
sumover($large_pdl,($sums = null));
will result in $sums
be of type short and is therefore prone to
overflow errors if $large_pdl
is a very large array. On the other
hand calling
@dims = $large_pdl->dims; shift @dims; sumover($large_pdl,($sums = zeroes(double,@dims)));
is not a good alternative either. Now we don't have overflow problems with
$sums
but at the expense of a type conversion of $large_pdl
to
double, something bad if this is really a large pdl. That's where double
comes in handy:
pp_def('sumoverd', Pars => 'a(n); double [o] b()', Code => 'double tmp=0; loop(n) %{ tmp += a(); %} $b() = tmp;', );
This gets us around the type conversion and overflow problems. Again,
analogous to the int
qualifier double
results in b
always being of
type double regardless of the type of a
without leading to a
type conversion of a
as a side effect.
Finally, there are the type+
qualifiers where type is one of int
or float
. What shall that mean. Let's illustrate the int+
qualifier with the actual definition of sumover:
pp_def('sumover', Pars => 'a(n); int+ [o] b()', Code => '$GENERIC(b) tmp=0; loop(n) %{ tmp += a(); %} $b() = tmp;', );
As we had already seen for the int
, float
and double
qualifiers, a pdl marked with a type+
qualifier does not influence
the datatype of the pdl operation. Its meaning is "make this pdl at
least of type type
or higher, as required by the type of the
operation". In the sumover example this means that when you call the
function with an a
of type PDL_Short the output pdl will be of type
PDL_Long (just as would have been the case with the int
qualifier). This again tries to avoid overflow problems when using
small datatypes (e.g. byte images). However, when the datatype of the
operation is higher than the type specified in the type+
qualifier
b
will be created with the datatype of the operation, e.g. when
a
is of type double then b
will be double as well. We hope you
agree that this is sensible behaviour for sumover
. It should be
obvious how the float+
qualifier works by analogy.
It may become necessary to be able to specify a set of alternative
types for the parameters. However, this will probably not be
implemented until someone comes up with a reasonable use for it.
Note that we now had to specify the $GENERIC
macro with the name
of the pdl to derive the type from that argument. Why is that? If you
carefully followed our explanations you will have realised that in some
cases b
will have a different type than the type of the operation.
Calling the '$GENERIC' macro with b
as argument makes sure that
the type will always the same as that of b
in that part of the
generic loop.
This is about all there is to say about the Pars
section in a
pp_def
call. You should remember that this section defines the signature
of a PP defined function, you can use several options to qualify certain
arguments as output and temporary args and all dimensions that you can
later refer to in the Code
section are defined by name.
It is important that you understand the meaning of the signature since in the latest PDL versions you can use it to define threaded functions from within Perl, i.e. what we call Perl level threading. Please check the PDL::Indexing manpage for details.
The Code
section contains the actual XS code that will be in the
innermost part of a thread loop (if you don't know what a thread loop is then
you still haven't read the PDL::Indexing manpage; do it now ;) after any PP macros
(like $GENERIC
) and PP functions have been expanded (like the
loop
function we are going to explain next).
Let's quickly reiterate the sumover
example:
pp_def('sumover', Pars => 'a(n); int+ [o] b()', Code => '$GENERIC(b) tmp=0; loop(n) %{ tmp += a(); %} $b() = tmp;', );
The loop
construct in the Code
section also refers to the
dimension name so you don't need to specify any limits: the loop is
correctly sized and everything is done for you, again.
Next, there is the surprising fact that $a()
and $b()
do not
contain the index. This is not necessary because we're looping over
n and both variables know which dimensions they have so
they automatically know they're being looped over.
This feature comes in very handy in many places and makes for much shorter code. Of course, there are times when you want to circumvent this; here is a function which make a matrix symmetric and serves as an example of how to code explicit looping:
pp_def('symm', Pars => 'a(n,n); [o]c(n,n);', Code => 'loop(n) %{ int n2; for(n2=n; n2<$SIZE(n); n2++) { $c(n0 => n, n1 => n2) = $c(n0 => n2, n1 => n) = $a(n0 => n, n1 => n2); } %} ' );
Let's dissect what is happening. Firstly, what is this function supposed to do? From its signature you see that it takes a 2D matrix with equal numbers of columns and rows and outputs a matrix of the same size. From a given input matrix $a it computes a symmetric output matrix $c (symmetric in the matrix sense that A^T = A where ^T means matrix transpose, or in PDL parlance $c == $c->xchg(0,1)). It does this by using only the values on and below the diagonal of $a. In the output matrix $c all values on and below the diagonal are the same as those in $a while those above the diagonal are a mirror image of those below the diagonal (above and below are here interpreted in the way that PDL prints 2D pdls). If this explanation still sounds a bit strange just go ahead, make a little file into which you write this definition, build the new PDL extension (see section on Makefiles for PP code) and try it out with a couple of examples.
Having explained what the function is supposed to do there are a
couple of points worth noting from the syntactical point of
view. First, we get the size of the dimension named n
again by
using the $SIZE
macro. Second, there are suddenly these funny n0
and n1
index names in the code though the signature defines only
the dimension n
. Why this? The reason becomes clear when you note
that both the first and second dimension of $a and $b are named n
in the signature of symm
. This tells PDL::PP that the first and
second dimension of these arguments should have the same
size. Otherwise the generated function will raise a runtime error.
However, now in an access to $a
and $c
PDL::PP cannot figure out
which index n
refers to any more just from the name of the index.
Therefore, the indices with equal dimension names get numbered from
left to right starting at 0, e.g. in the above example n0
refers to
the first dimension of $a
and $c
, n1
to the second and so on.
In all examples so far, we have only used the Pars
and Code
members of the hash that was passed to pp_def
. There are certainly
other keys that are recognised by PDL::PP and we will hear about some
of them in the course of this document. Find a (non-exhaustive) list
of keys in Appendix A. A list of macros and PPfunctions (we have only
encountered some of those in the examples above yet) that are expanded
in values of the hash argument to pp_def
is summarised in Appendix
B.
At this point, it might be appropriate to mention that PDL::PP is not a completely static, well designed set of routines (as Tuomas puts it: "stop thinking of PP as a set of routines carved in stone") but rather a collection of things that the PDL::PP author (Tuomas J. Lukka) considered he would have to write often into his PDL extension routines. PP tries to be expandable so that in the future, as new needs arise, new common code can be abstracted back into it. If you want to learn more on why you might want to change PDL::PP and how to do it check the section on PDL::PP internals.
If you do not have bad-value support compiled into PDL you can
ignore this section and the related keys: BadCode
, HandleBad
, ...
(try printing out the value of $PDL::Bad::Status
- if it equals 0
then move straight on).
There are several keys and macros used when writing code to handle
bad values. The first one is the HandleBad
key:
This flags a pp-routine as NOT handling bad values. If this routine
is sent piddles with their badflag
set, then a warning message is
printed to STDOUT and the piddles are processed as if the value used to
represent bad values is a valid number. The badflag
value is
not propogated to the output piddles.
An example of when this is used is for FFT routines, which generally do not have a way of ignoring part of the data.
This causes PDL::PP to write extra code that ensures the BadCode
section is used, and that the $ISBAD()
macro (and its brethren)
work.
If any of the input piddles have their badflag
set, then the
output piddles will have their badflag
set, but any supplied
BadCode is ignored.
The value of HandleBad
is used to define the contents of
the BadDoc
key, if it is not given.
To handle bad values, code must be written somewhat differently; for instance,
$c() = $a() + $b();
becomes something like
if ( $a() != BADVAL && $b() != BADVAL ) { $c() = $a() + $b(); } else { $c() = BADVAL; }
However, we only want the second version if bad values are present in
the input piddles (and that bad-value support is wanted!) - otherwise
we actually want the original code. This is where the BadCode
key comes in; you use it to specify the code to execute if bad values
may be present, and PP uses both it and the Code
section to create
something like:
if ( bad_values_are_present ) { fancy_threadloop_stuff { BadCode } } else { fancy_threadloop_stuff { Code } }
This approach means that there is virtually no overhead when bad values are not present (i.e. the badflag routine returns 0).
The BadCode section can use the same macros and looping constructs as the Code section. However, it wouldn't be much use without the following additional macros:
To check whether a piddle's value is bad, use the $ISBAD
macro:
if ( $ISBAD(a()) ) { printf("a() is bad\n"); }
You can also access given elements of a piddle:
if ( $ISBAD(a(n=>l)) ) { printf("element %d of a() is bad\n", l); }
This is the opposite of the $ISBAD
macro.
For when you want to set an element of a piddle bad.
If you have cached the value of a piddle $a()
into a c-variable (foo
say),
then to check whether it is bad, use $ISBADVAR(foo,a)
.
As above, but this time checking that the cached value isn't bad.
To copy the bad value for a piddle into a c variable, use
$SETBADVAR(foo,a)
.
TODO: mention $PPISBAD()
etc macros.
Using these macros, the above code could be specified as:
Code => '$c() = $a() + $b();', BadCode => ' if ( $ISBAD(a()) || $ISBAD(b()) ) { $SETBAD(c()); } else { $c() = $a() + $b(); }',
Since this is Perl, TMTOWTDI, so you could also write:
BadCode => ' if ( $ISGOOD(a()) && $ISGOOD(b()) ) { $c() = $a() + $b(); } else { $SETBAD(c()); }',
If you want access to the value of the badflag for a given
piddle, you can use the $PDLSTATExxxx()
macros:
TODO: mention the FindBadStatusCode
and
CopyBadStatusCode
options to pp_def
, as well as the
BadDoc
key.
Now, consider the following: you have your own C function (that may in fact be part of some library you want to interface to PDL) which takes as arguments two pointers to vectors of double:
void myfunc(int n,double *v1,double *v2);
The correct way of defining the PDL function is
pp_def('myfunc', Pars => 'a(n); [o]b(n);', GenericTypes => ['D'], Code => 'myfunc($SIZE(n),$P(a),$P(b));' );
The $P(
par)
syntax returns a pointer to the first
element and the other elements are guaranteed to lie after that.
Notice that here it is possible to make many mistakes. First,
$SIZE(n)
must be used instead of n
. Second, you shouldn't put
any loops in this code. Third, here we encounter a new hash key
recognised by PDL::PP : the GenericTypes
declaration tells PDL::PP
to ONLY GENERATE THE TYPELOOP FOP THE LIST OF TYPES SPECIFIED. In
this case double
. This has two advantages. Firstly the size of
the compiled code is reduced vastly, secondly if non-double arguments
are passed to myfunc()
PDL will automatically convert them to
double before passing to the external C routine and convert them
back afterwards.
One can also use Pars
to qualify the types of individual
arguments. Thus one could also write this as:
pp_def('myfunc', Pars => 'double a(n); double [o]b(n);', Code => 'myfunc($SIZE(n),$P(a),$P(b));' );
The type specification in Pars
exempts the argument from
variation in the typeloop - rather it is automatically converted
too and from the type specified. This is obviously useful in
a more general example, e.g.:
void myfunc(int n,float *v1,long *v2);
pp_def('myfunc', Pars => 'float a(n); long [o]b(n);', GenericTypes => ['F'], Code => 'myfunc($SIZE(n),$P(a),$P(b));' );
Note we still use GenericTypes
to reduce the size of the
type loop, obviously PP could in principle spot this and do
it automatically though the code has yet to attain that
level of sophistication!
Finally note when types are converted automatically one MUST
use the [o]
qualifier for output variables or you hard
one changes will get optimised away by PP!
If you interface a large library you can automate the interfacing even
further. Perl can help you again(!) in doing this. In many libraries
you have certain calling conventions. This can be exploited. In short,
you can write a little parser (which is really not difficult in Perl) that
then generates the calls to pp_def
from parsed descriptions of the
functions in that library. For an example, please check the Slatec
interface in the Lib
tree of the PDL distribution. If you want to check
(during debugging) which calls to PP functions your Perl code generated
a little helper package comes in handy which replaces the PP functions
by identically named ones that dump their arguments to stdout.
Just say
perl -MPDL::PP::Dump myfile.pd
to see the calls to pp_def
and friends. Try it with ops.pd and
slatec.pd. If you're interested (or want to enhance it), the source
is in Basic/Gen/PP/Dump.pm
Macros: So far we have encountered the $SIZE
, $GENERIC
and $P
macros.
Now we are going to quickly explain the other macros that are expanded in the
Code
section of PDL::PP along with examples of their usage.
$T
The $T
macro is used for type switches. This is very useful when you have
to use different external (e.g. library) functions depending on the input
type of arguments. The general syntax is
$Ttypeletters(type_alternatives)
where typeletters
is a permutation of a subset of the letters
BSULFD
which stand for Byte, Short, Ushort, etc. and
type_alternatives
are the expansions when the type of the PP
operation is equal to that indicated by the respective letter. Let's
illustrate this incomprehensible description by an example. Assuming
you have two C functions with prototypes
void float_func(float *in, float *out); void double_func(double *in, double *out);
which do basically the same thing but one accepts float and the other
double pointers. You could interface them to PDL by defining a generic
function foofunc
(which will call the correct function depending
on the type of the transformation):
pp_def('foofunc', Pars => ' a(n); [o] b();', Code => ' $TFD(float_func,double_func) ($P(a),$P(b));' GenericTypes => [qw(F D)], );
Please note that you can't say
Code => ' $TFD(float,double)_func ($P(a),$P(b));'
since the $T
macro expands with trailing spaces, analogously to
C preprocessor macros.
The slightly longer form illustrated above is correct.
If you really want brevity, you can of course do
'$TBSULFD('.(join ',',map {"long_identifier_name_$_"} qw/byt short unseigned lounge flotte dubble/).');'
$PP
The $PP
macro is used for a so called physical pointer access. The
physical refers to some internal optimisations of PDL (for those who
are familiar with the PDL core we are talking about the vaffine
optimisations). This macro is mainly for internal use and you shouldn't
need to use it in any of your normal code.
$COMP
(and the OtherPars
section)The $COMP
macro is used to access non-pdl values in the code section. Its
name is derived from the implementation of transformations in PDL. The
variables you can refer to using $COMP
are members
of the ``compiled'' structure that represents the PDL transformation in question
but does not yet contain any information about dimensions
(for further details check the PDL::Internals manpage). However, you can treat
$COMP
just as a black box without knowing anything about the
implementation of transformations in PDL. So when would you use this
macro? Its main usage is to access values of arguments that are
declared in the OtherPars
section of a pp_def
definition. But
then you haven't heard about the OtherPars
key yet?! Let's have
another example that illustrates typical usage of both new features:
pp_def('pnmout', Pars => 'a(m)', OtherPars => "char* fd", GenericTypes => [qw(B U S L)], Code => 'PerlIO *fp; IO *io;
io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO)); if (!io || !(fp = IoIFP(io))) croak("Can\'t figure out FP");
if (PerlIO_write(fp,$P(a),len) != len) croak("Error writing pnm file"); ');
This function is used to write data from a pdl to a file. The file descriptor
is passed as a string into this function. This parameter does not go into
the Pars
section since it cannot be usefully treated like a pdl but rather
into the aptly named OtherPars
section. Parameters in the OtherPars
section follow those in the Pars
section when invoking the function, i.e.
open FILE,">out.dat" or die "couldn't open out.dat"; pnmout($pdl,'FILE');
When you want to access this parameter inside the code section you
have to tell PP by using the $COMP
macro, i.e. you write
$COMP(fd)
as in the example. Otherwise PP wouldn't know that the
fd
you are referring to is the same as that specified in the
OtherPars
section.
Another use for the OtherPars
section is to set a named dimension
in the signature. Let's have an example how that is done:
pp_def('setdim', Pars => '[o] a(n)', OtherPars => 'int ns => n', Code => 'loop(n) %{ $a() = n; %}', );
This says that the named dimension n
will be initialised from the
value of the other parameter ns
which is of integer type (I guess
you have realised that we use the CType From => named_dim
syntax).
Now you can call this function in the usual way:
setdim(($a=null),5); print $a; [ 0 1 2 3 4 ]
Admittedly this function is not very useful but it demonstrates how it
works. If you call the function with an existing pdl and you don't need
to explicitly specify the size of n
since PDL::PP can figure it out
from the dimensions of the non-null pdl. In that case you just give the
dimension parameter as -1
:
$a = hist($b); setdim($a,-1);
That should do it.
The only PP function that we have used in the examples so far is loop
.
Additionally, there are currently two other functions which are recognised
in the Code
section:
As we heard above the signature of a PP defined function defines the
dimensions of all the pdl arguments involved in a primitive operation.
However, you often call the functions that you defined with PP with pdls
that have more dimensions than those specified in the signature. In this
case the primitive operation is performed on all subslices of appropriate
dimensionality in what is called a thread loop (see also overview above
and the PDL::Indexing manpage). Assuming you have some notion of this concept you
will probably appreciate that the operation specified in the code section
should be optimised since this is the tightest loop inside a thread loop.
However, if you revisit the example where we define the pnmout
function,
you will quickly realise that looking up the IO
file descriptor
in the inner thread loop is not very efficient when writing a pdl with
many rows. A better approach would be to look up the IO
descriptor
once outside the thread loop and use its value then inside the tightest
thread loop. This is exactly where the threadloop
function comes in
handy. Here is an improved definition of pnmout
which uses this
function:
pp_def('pnmout', Pars => 'a(m)', OtherPars => "char* fd", GenericTypes => [qw(B U S L)], Code => 'PerlIO *fp; IO *io; int len;
io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO)); if (!io || !(fp = IoIFP(io))) croak("Can\'t figure out FP");
len = $SIZE(m) * sizeof($GENERIC());
threadloop %{ if (PerlIO_write(fp,$P(a),len) != len) croak("Error writing pnm file"); %} ');
This works as follows. Normally the C code you write inside the
Code
section is placed inside a thread loop (i.e. PP generates the
appropriate wrapping XS code around it). However, when you explicitly
use the threadloop
function, PDL::PP recognises this and doesn't
wrap your code with an additional thread loop. This has the effect that
code you write outside the thread loop is only executed once per
transformation and just the code with in the surrounding %{ ... %}
pair is placed within the tightest thread loop. This also comes in
handy when you want to perform a decision (or any other code,
especially CPU intensive code) only once per thread, i.e.
pp_addhdr(' #define RAW 0 #define ASCII 1 '); pp_def('do_raworascii', Pars => 'a(); b(); [o]c()', OtherPars => 'int mode', Code => ' switch ($COMP(mode)) { case RAW: threadloop %{ /* do raw stuff */ %} break; case ASCII: threadloop %{ /* do ASCII stuff */ %} break; default: croak("unknown mode"); }' );
The types function works similar to the $T
macro. However, with the
types
function the code in the following block (delimited by %{
and %}
as usual) is executed for all those cases in which the datatype
of the operation is any of the types represented by the letters in the
argument to type
, e.g.
Code => '...
types(BSUL) %{ /* do integer type operation */ %} types(FD) %{ /* do floating point operation */ %} ...'
The RedoDimsCode
key is an optional key that is used to
compute dimensions of piddles at runtime in case the
standard rules for computing dimensions from the signature
are not sufficient. The contents of the RedoDimsCode
entry
is interpreted in the same way that the Code section is
interpreted-- i.e., PP macros are expanded and the result
is interpreted as C code. The purpose of the code is to set
the size of some dimensions that appear in the
signature. Storage allocation and threadloops and so forth
will be set up as if the computed dimension had appeared in
the signature. In your code, you first compute the desired
size of a named dimension in the signature according to
your needs and then assign that value to it via the $SIZE()
macro.
As an example, consider the following situation. You are
interfacing an external library routine that requires an
temporary array for workspace to be passed as an
argument. Two input data arrays that are passed are p(m)
and x(n)
. The output data array is y(n)
. The routine
requires a workspace array with a length of n+m*m, and you'd
like the storage created automatically just like it would be
for any piddle flagged with [t] or [o]. What you'd like is to
say something like
pp_def( "myexternalfunc", Pars => " p(m); x(n); [o] y; [t] work(n+m*m); ", ...
but that won't work, because PP can't interpret expressions with arithmetic in the signature. Instead you write
pp_def( "myexternalfunc", Pars => " p(m); x(n); [o] y; [t] work(wn); ", RedoDimsCode => " int im = $PDL(p)->dims[0]; int in = $PDL(x)->dims[0]; int min = in + im * im; int inw = $PDL(work)->dims[0]; $SIZE(wn) = inw >= min ? inw : min; ", Code => " externalfunc($P(p),$P(x),$SIZE(m),$SIZE(n),$P(work)); ";)
This code works as follows: The macro $PDL(p) expands to a
pointer to the pdl struct for the piddle p. You don't want
a pointer to the data ( ie $P ) in this case, because you
want to access the methods for the piddle on the C
level. You get the first dimension of each of the piddles
and store them in integers. Then you compute the minimum
length the work array can be. If the user sent a piddle
work
with sufficient storage, then leave it alone. If the
user sent, say a null pdl, or no pdl at all, then the size
of wn will be zero and you reset it to the minimum
value. Before the code in the Code section is executed PP
will create the proper storage for work
if it does not
exist. Note that you only took the first dimension of p
and x
because the user may have sent piddles with extra
threading dimensions. Of course, the temporary piddle work
(note the
[t] flag) should not be given any thread dimensions anyway.
You can also use RedoDimsCode
to set the dimension of a
piddle flagged with [o]. In this case you set the dimensions
for the named dimension in the signature using $SIZE() as in
the preceeding example. However, because the piddle is
flagged with [o] instead of [t], threading dimensions will
be added if required just as if the size of the dimension
were computed from the signature according to the usual
rules. Here is an example from PDL::Math
pp_def("polyroots", Pars => 'cr(n); ci(n); [o]rr(m); [o]ri(m);', RedoDimsCode => 'int sn = $PDL(cr)->dims[0]; $SIZE(m) = sn-1;',
The input piddles are the real and imaginary parts of
complex coefficients of a polynomial. The output piddles are
real and imaginary parts of the roots. There are n
roots
to an n
th order polynomial and such a polynomial has
n+1
coefficients (the zeoreth through the n
th). In
this example, threading will work correctly. That is, the
first dimension of the output piddle with have its dimension
adjusted, but other threading dimensions will be assigned
just as if there were no RedoDimsCode
.
OtherPars
sectionThe OtherPars
section discussed above is very often absolutely
crucial when you interface external libraries with PDL. However in
many cases the external libraries either use derived types or
pointers of various types.
The standard way to handle this in Perl is to use a typemap
file.
This is discussed in some detail in perlxs in the standard
Perl documentation. In PP the functionality is very similar, so you can
create a typemap
file in the directory where your PP file resides and
when it is built it is automatically read in to figure out the appropriate
translation between the C type and Perl's built-in type.
That said, there are a couple of important differences from the general
handling of types in XS. The first, and probably most important, is that
at the moment pointers to types are not allowed in the OtherPars
section. To get around this limitation you must use the IV
type
(thanks to Judd Taylor for pointing out that this is necessary for
portability).
It is probably best to illustrate this with a couple of code-snippets:
For instance the gsl_spline_init
function has the following C
declaration:
int gsl_spline_init(gsl_spline * spline, const double xa[], const double ya[], size_t size);
Clearly the xa
and ya
arrays are candidates for being passed
in as piddles and the size
argument is just the length of these
piddles so that can be handled by the $SIZE()
macro in PP. The
problem is the pointer to the gsl_spline
type. The natural solution
would be to write an OtherPars
declaration of the form
OtherPars => 'gsl_spline *spl'
and write a short typemap
file which handled this type. This does
not work at present however! So what you have to do is to go around
the problem slightly (and in some ways this is easier too!):
The solution is to declare spline
in the OtherPars
section using
an "Integer Value", IV
. This hides the nature of the variable from
PP and you then need to (well to avoid compiler warnings at least!)
perform a type cast when you use the variable in your code. Thus
OtherPars
should take the form:
OtherPars => 'IV spl'
and when you use it in the code you will write
INT2PTR(gsl_spline *, $COMP(spl))
where the Perl API macro INT2PTR
has been used to handle the pointer
cast to avoid compiler warnings and problems for machines with mixed 32bit
and 64bit Perl configurations. Putting this together as Andres Jordan has
done (with the modification using IV
by Judd Taylor) in the
gsl_interp.pd
in the distribution source you get:
pp_def('init_meat', Pars => 'double x(n); double y(n);', OtherPars => 'IV spl', Code =>' gsl_spline_init,( INT2PTR(gsl_spline *, $COMP(spl)), $P(x),$P(y),$SIZE(n)));' );
where I have removed a macro wrapper call, but that would obscure the discussion.
The other minor difference as compared to the standard typemap handling
in Perl, is that the user cannot specify non-standard typemap locations or
typemap filenames using the TYPEMAPS
option in MakeMaker... Thus you
can only use a file called typemap
and/or the IV
trick above.
You have already heard about the OtherPars
key. Currently, there are not
many other keys for a data operation that will be useful in normal (whatever
that is) PP programming. In fact, it would be interesting to hear about
a case where you think you need more than what is provided at the moment.
Please speak up on one of the PDL mailing lists. Most other keys recognised
by pp_def
are only really useful for what we call slice operations
(see also above).
One thing that is strongly being planned is variable number of arguments, which will be a little tricky.
An incomplete list of the available keys:
Setting this key marks the routine as working inplace - ie
the input and output piddles are the same. An example is
$a->inplace->sqrt()
(or sqrt(inplace($a))
).
Use when the routine is a unary function, such as sqrt
.
If there are more than one input piddles, specify the name of the one that can be changed inplace using an array reference.
If there are more than one output piddle, specify the name of the input piddle and output piddle in a 2-element array reference. This probably isn't needed, but left in for completeness.
If bad values are being used, care must be taken to ensure the propagation of the badflag when inplace is being used; consider this excerpt from Basic/Bad/bad.pd:
pp_def('replacebad',HandleBad => 1, Pars => 'a(); [o]b();', OtherPars => 'double newval', Inplace => 1, CopyBadStatusCode => '/* propogate badflag if inplace AND it has changed */ if ( a == b && $ISPDLSTATEBAD(a) ) PDL->propogate_badflag( b, 0 );
/* always make sure the output is "good" */ $SETPDLSTATEGOOD(b); ', ...
Since this routine removes all bad values, then the output piddle had
its bad flag cleared. If run inplace (so a == b
), then we have to
tell all the children of a
that the bad flag has been cleared (to
save time we make sure that we call PDL->propogate_badgflag
only
if the input piddle had its bad flag set).
NOTE: one idea is that the documentation for the routine could be
automatically flagged to indicate that it can be executed inplace,
ie something similar to how HandleBad
sets BadDoc
if it's not
supplied (it's not an ideal solution).
So far, we have described the pp_def
and pp_done
functions. PDL::PP
exports a few other functions to aid you in writing concise PDL extension
package definitions.
Often when you interface library functions as in the above example
you have to include additional C include files. Since the XS file is
generated by PP we need some means to make PP insert the appropriate
include directives in the right place into the generated XS file.
To this end there is the pp_addhdr
function. This is also the function
to use when you want to define some C functions for internal use by some
of the XS functions (which are mostly functions defined by pp_def
).
By including these functions here you make sure that PDL::PP inserts your
code before the point where the actual XS module section begins and will
therefore be left untouched by xsubpp (cf. perlxs and perlxstut
man pages).
A typical call would be
pp_addhdr(' #include <unistd.h> /* we need defs of XXXX */ #include "libprotos.h" /* prototypes of library functions */ #include "mylocaldecs.h" /* Local decs */
static void do_the real_work(PDL_Byte * in, PDL_Byte * out, int n) { /* do some calculations with the data */ } ');
This ensures that all the constants and prototypes you need will be properly
included and that you can use the internal functions defined here in the
pp_def
s, e.g.:
pp_def('barfoo', Pars => ' a(n); [o] b(n)', GenericTypes => ['B'], Code => ' int ns = $SIZE(n); do_the_real_work($P(a),$P(b),ns); ', );
In many cases the actual PP code (meaning the arguments to pp_def
calls) is only part of the package you are currently
implementing. Often there is additional Perl code and XS code
you would normally have written into the pm and XS files which are now
automatically generated by PP. So how to get this stuff into those
dynamically generated files? Fortunately, there are a couple of
functions, generally called pp_addXXX
that assist you in doing
this.
Let's assume you have additional Perl code that should go into the
generated pm-file. This is easily achieved with the pp_addpm
command:
pp_addpm(<<'EOD');
=head1 NAME
PDL::Lib::Mylib -- a PDL interface to the Mylib library
=head1 DESCRIPTION
This package implements an interface to the Mylib package with full threading and indexing support (see L<PDL::Indexing>).
=cut
use PGPLOT;
=head2 use_myfunc this function applies the myfunc operation to all the elements of the input pdl regardless of dimensions and returns the sum of the result =cut
sub use_myfunc { my $pdl = shift;
myfunc($pdl->clump(-1),($res=null));
return $res->sum; }
EOD
You have probably got the idea. In some cases you also want to export
your additional functions. To avoid getting into trouble with PP which
also messes around with the @EXPORT
array you just tell PP to add
your functions to the list of exported functions:
pp_add_exported('use_myfunc gethynx');
The pp_add_isa
command works like the the pp_add_exported
function.
The arguments to pp_add_isa
are added the @ISA list, e.g.
pp_add_isa(' Some::Other::Class ');
If your pp_def routines are to be used as object methods use
pp_bless
to specify the package (i.e. class) to which
your pp_defed methods will be added. For example,
pp_bless('PDL::MyClass')
. The default is PDL
if this is
omitted.
Sometimes you want to add extra XS code of your own (that is generally not involved with any threading/indexing issues but supplies some other functionality you want to access from the Perl side) to the generated XS file, for example
pp_addxs('','
# Determine endianness of machine
int isbigendian() CODE: unsigned short i; PDL_Byte *b;
i = 42; b = (PDL_Byte*) (void*) &i;
if (*b == 42) RETVAL = 0; else if (*(b+1) == 42) RETVAL = 1; else croak("Impossible - machine is neither big nor little endian!!\n"); OUTPUT: RETVAL ');
Especially pp_add_exported
and pp_addxs
should be used with care. PP uses
PDL::Exporter, hence letting PP export your function means that they get added
to the standard list of function exported by default (the list defined by the
export tag ``:Func''). If you use pp_addxs
you shouldn't try to do anything
that involves threading or indexing directly. PP is much better at generating
the appropriate code from your definitions.
Finally, you may want to add some code to the BOOT section of the XS file
(if you don't know what that is check perlxs). This is easily done
with the pp_add_boot
command:
pp_add_boot(<<EOB); descrip = mylib_initialize(KEEP_OPEN);
if (descrip == NULL) croak("Can't initialize library");
GlobalStruc->descrip = descrip; GlobalStruc->maxfiles = 200; EOB
By default, PP.pm puts all subs defined using the pp_def function into the output .pm file's EXPORT list. This can create problems if you are creating a subclassed object where you don't want any methods exported. (i.e. the methods will only be called using the $object->method syntax).
For these cases you can call pp_export_nothing()
to clear out the export list. Example (At
the end of the .pd file):
pp_export_nothing(); pp_done();
By default, PP.pm puts the 'use Core;' line into the output .pm file. This imports Core's exported names into the current namespace, which can create problems if you are over-riding one of Core's methods in the current file. You end up getting messages like "Warning: sub sumover redefined in file subclass.pm" when running the program.
For these cases the pp_core_importList can be used to change what is imported from Core.pm. For example:
pp_core_importList('()')
This would result in
use Core();
being generated in the output .pm file. This would result in no names being imported from Core.pm. Similarly, calling
pp_core_importList(' qw/ barf /')
would result in
use Core qw/ barf/;
being generated in the output .pm file. This would result in just 'barf' being imported from Core.pm.
I am pretty sure that this allows you to simultaneously set the .pm and .xs files' versions, thus avoiding unnecessary version-skew between the two. To use this, simply have the following line at some point in your .pd file:
pp_setversion('0.0.3');
However, don't use this if you use the Module::Build::PDL manpage. See that module's documentation for details.
The slice operation section of this manual is provided using dataflow and lazy evaluation: when you need it, ask Tjl to write it. a delivery in a week from when I receive the email is 95% probable and two week delivery is 99% probable.
And anyway, the slice operations require a much more intimate knowledge of PDL internals than the data operations. Furthermore, the complexity of the issues involved is considerably higher than that in the average data operation. If you would like to convince yourself of this fact take a look at the Basic/Slices/slices.pd file in the PDL distribution :-). Nevertheless, functions generated using the slice operations are at the heart of the index manipulation and dataflow capabilities of PDL.
Also, there are a lot of dirty issues with virtual piddles and vaffines which we shall entirely skip here.
Slice operations need to be able to handle bad values (if support is compiled into PDL). The easiest thing to do is look at Basic/Slices/slices.pd to see how this works.
Along with BadCode
, there are also the BadBackCode
and
BadRedoDimsCode
keys for pp_def
. However, any
EquivCPOffsCode
should not need changing, since
any changes are absorbed into the definition of the
$EQUIVCPOFFS()
macro (i.e. it is handled automatically
by PDL::PP>.
The following few paragraphs describe writing of a new slicing routine ('range'); any errors are CED's. (--CED 26-Aug-2002)
warn
and barf
in PP CodeFor printing warning messages or aborting/dieing, you can call warn
or barf
from PP code.
However, you should be aware that these calls have been redefined using C preprocessor
macros to PDL->barf
and PDL->warn
. These redefinitions are in place to keep
you from inadvertently calling perl's warn
or barf
directly, which can cause segfaults during
pthreading (i.e. processor multi-threading).
PDL's own versions of barf
and warn
will queue-up warning or barf messages until after pthreading
is completed, and then call the perl versions of these routines.
See the PDL::ParallelCPU manpage for more information on pthreading.
The PDL Core
structure, defined in Basic/Core/pdlcore.h.PL, contains
pointers to a number of routines that may be useful to you. The majority
of these routines deal with manipulating piddles, but some are more general:
Sort the array xx
between the indices a
and b
.
There are also versions for the other PDL datatypes,
with postfix _S
, _U
, _L
, _F
, and _D
.
Any module using this must ensure that PDL::Ufunc
is loaded.
As for PDL->qsort_B
, but this time sorting the indices
rather than the data.
The routine med2d
in Lib/Image2D/image2d.pd shows how such routines are
used.
If you are going to generate a package from your PP file (typical file
extensions are .pd
or .pp
for the files containing PP code) it
is easiest and safest to leave generation of the appropriate commands
to the Makefile. In the following we will outline the typical format
of a Perl Makefile to automatically build and install your package
from a description in a PP file. Most of the rules to build the xs, pm
and other required files from the PP file are already predefined in
the PDL::Core::Dev package. We just have to tell MakeMaker to use
it.
In most cases you can define your Makefile like
# Makefile.PL for a package defined by PP code.
use PDL::Core::Dev; # Pick up development utilities use ExtUtils::MakeMaker;
$package = ["mylib.pd",Mylib,PDL::Lib::Mylib]; %hash = pdlpp_stdargs($package); $hash{OBJECT} .= ' additional_Ccode$(OBJ_EXT) '; $hash{clean}->{FILES} .= ' todelete_Ccode$(OBJ_EXT) '; $hash{'VERSION_FROM'} = 'mylib.pd'; WriteMakefile(%hash);
sub MY::postamble { pdlpp_postamble($package); }
Here, the list in $package is: first: PP source file name, then the prefix for the produced files and finally the whole package name. You can modify the hash in whatever way you like but it would be reasonable to stay within some limits so that your package will continue to work with later versions of PDL.
If you don't want to use prepackaged arguments, here is a generic Makefile.PL that you can adapt for your own needs:
# Makefile.PL for a package defined by PP code.
use PDL::Core::Dev; # Pick up development utilities use ExtUtils::MakeMaker;
WriteMakefile( 'NAME' => 'PDL::Lib::Mylib', 'VERSION_FROM' => 'mylib.pd', 'TYPEMAPS' => [&PDL_TYPEMAP()], 'OBJECT' => 'mylib$(OBJ_EXT) additional_Ccode$(OBJ_EXT)', 'PM' => { 'Mylib.pm' => '$(INST_LIBDIR)/Mylib.pm'}, 'INC' => &PDL_INCLUDE(), # add include dirs as required by your lib 'LIBS' => [''], # add link directives as necessary 'clean' => {'FILES' => 'Mylib.pm Mylib.xs Mylib$(OBJ_EXT) additional_Ccode$(OBJ_EXT)'}, );
# Add genpp rule; this will invoke PDL::PP on our PP file # the argument is an array reference where the array has three string elements: # arg1: name of the source file that contains the PP code # arg2: basename of the xs and pm files to be generated # arg3: name of the package that is to be generated sub MY::postamble { pdlpp_postamble(["mylib.pd",Mylib,PDL::Lib::Mylib]); }
To make life even easier PDL::Core::Dev defines the function pdlpp_stdargs
that returns a hash with default values that can be passed (either
directly or after appropriate modification) to a call to WriteMakefile.
Currently, pdlpp_stdargs
returns a hash where the keys are filled in
as follows:
( 'NAME' => $mod, 'TYPEMAPS' => [&PDL_TYPEMAP()], 'OBJECT' => "$pref\$(OBJ_EXT)", PM => {"$pref.pm" => "\$(INST_LIBDIR)/$pref.pm"}, MAN3PODS => {"$src" => "\$(INST_MAN3DIR)/$mod.\$(MAN3EXT)"}, 'INC' => &PDL_INCLUDE(), 'LIBS' => [''], 'clean' => {'FILES' => "$pref.xs $pref.pm $pref\$(OBJ_EXT)"}, )
Here, $src
is the name of the source file with PP code, $pref
the
prefix for the generated .pm and .xs files and $mod
the name of the
extension module to generate.
The internals of the current version consist of a large table which gives the rules according to which things are translated and the subs which implement these rules.
Later on, it would be good to make the table modifiable by the user so that different things may be tried.
[Meta comment: here will hopefully be more in the future; currently, your best bet will be to read the source code :-( or ask on the list (try the latter first) ]
Unless otherwise specified, the arguments are strings. Keys marked with (bad) are only used if bad-value support is compiled into PDL.
define the signature of your function
arguments which are not pdls. Default: nothing. This is a semi-colon separated list of arguments, e.g., OtherPars=>'int k; double value; char* fd'
. See $COMP(x)
and also the same entry in Appendix B.
the actual code that implements the functionality; several PP macros and PP functions are recognised in the string value
If set to 1, the routine is assumed to support bad values and the code in
the BadCode key is used if bad values are present;
it also sets things up so that the $ISBAD()
etc macros can be used.
If set to 0, cause the routine to print a warning if any of the input piddles
have their bad flag set.
Give the code to be used if bad values may be present in the input piddles.
Only used if HandleBad => 1
.
An array reference. The array may contain any subset of the one-character strings `B', `S', `U', `L', `Q', `F' and `D', which specify which types your operation will accept. The meaning of each type is:
B - signed byte (i.e. signed char) S - signed short (two-byte integer) U - unsigned short L - signed long (four-byte integer, int on 32 bit systems) Q - signed long long (eight byte integer) F - float D - double
This is very useful (and important!) when interfacing an external library. Default: [qw/B S U L Q F D/]
Mark a function as being able to work inplace.
Inplace => 1 if Pars => 'a(); [o]b();' Inplace => ['a'] if Pars => 'a(); b(); [o]c();' Inplace => ['a','b'] if Pars => 'a(); b(); [o]c(); [o]d();'
If bad values are being used, care must be taken to ensure the
propagation of the badflag when inplace is being used;
for instance see the code for replacebad
in Basic/Bad/bad.pd.
Used to specify a documentation string in Pod format. See PDL::Doc for information on PDL documentation conventions. Note: in the special case where the PP 'Doc' string is one line this is implicitly used for the quick reference AND the documentation!
If the Doc field is omitted PP will generate default documentation (after all it knows about the Signature).
If you really want the function NOT to be documented in any way at this point
(e.g. for an internal routine, or because you are doing it elsewhere in the
code) explicitly specify Doc=>undef
.
Contains the text returned by the badinfo
command (in perldl
) or
the -b
switch to the pdldoc
shell script. In many cases, you will
not need to specify this, since the information can be automatically
created by PDL::PP. However, as befits computer-generated text, it's
rather stilted; it may be much better to do it yourself!
Optional flag to indicate the PDL function should not use processor threads (i.e. pthreads or POSIX threads) to split up work across mutliple CPU cores. This option is typically set to 1 if the underlying PDL function is not threadsafe. If this option isn't present, then the function is assumed to be threadsafe. This option only applies if PDL has been compiled with POSIX threads enabled.
PDL functions allow you to pass in a piddle into which you want the output saved. This is handy because you can allocate an output piddle once and reuse it many times; the alternative would be for PDL to create a new piddle each time, which may waste compute cycles or, more likely, RAM. This added flexibility comes at the cost of more complexity: PDL::PP has to write functions that are smart enough to count the arguments passed to it and create new piddles on the fly, but only if you want them.
PDL::PP is smart enough to do that, but there are restrictions on argument order and the like. If you want a more flexible function, you can write your own Perl-side wrapper and specify it in the PMCode key. The string that you supply must (should) define a Perl function with a name that matches what you gave to pp_def in the first place. When you wish to eventually invoke the PP-generated function, you will need to supply all piddles in the exact order specified in the signature: output piddles are not optional, and the PP-generated function will not return anything. The obfuscated name that you will call is _<funcname>_int.
I believe this documentation needs further clarification, but this will have to do. :-(
When pp_def generates functions, it typically defines them in the PDL package. Then, in the .pm file that it generates for your module, it typically adds a line that essentially copies that function into your current package's symbol table with code that looks like this:
*func_name = \&PDL::func_name;
It's a little bit smarter than that (it knows when to wrap that sort of thing in a BEGIN block, for example, and if you specified something different for pp_bless), but that's the gist of it. If you don't care to import the function into your current package's symbol table, you can specify
PMFunc => '',
PMFunc has no other side-effects, so you could use it to insert arbitrary Perl code into your module if you like. However, you should use pp_addpm if you want to add Perl code to your module.
Macros labeled by (bad) are only used if bad-value support is compiled into PDL.
access a pdl (by its name) that was specified in the signature
access a value in the private data structure of this transformation (mainly
used to use an argument that is specified in the OtherPars
section)
replaced at runtime by the actual size of a named dimension (as specified in the signature)
replaced by the C type that is equal to the runtime type of the operation
a pointer access to the PDL named a
in the signature. Useful for
interfacing to C functions
a physical pointer access to pdl a
; mainly for internal use
expansion alternatives according to runtime type of operation,
where XXX is some string that is matched by /[BSULFD+]/
.
return a pointer to the pdl data structure (pdl *) of piddle a
returns true if the value stored in a()
equals the bad value
for this piddle.
Requires HandleBad
being set to 1.
returns true if the value stored in a()
does not equal the bad value
for this piddle.
Requires HandleBad
being set to 1.
Sets a()
to equal the bad value for this piddle.
Requires HandleBad
being set to 1.
loop(DIMS) %{ ... %}
loop over named dimensions; limits are generated automatically by PP
threadloop %{ ... %}
enclose following code in a thread loop
types(TYPES) %{ ... %}
execute following code if type of operation is any of TYPES
A number of functions are imported when you use PDL::PP
. These include
functions that control the generated C or XS code, functions that control
the generated Perl code, and functions that manipulate the packages and
symbol tables into which the code is created.
PDL::PP's main purpose is to make it easy for you to wrap the threading engine around your own C code, but you can do some other things, too.
Used to wrap the threading engine around your C code. Virtually all of this document discusses the use of pp_def.
Indicates you are done with PDL::PP and that it should generate its .xs and .pm files based upon the other pp_* functions that you have called. This function takes no arguments.
This lets you add XS code to your .xs file. This is useful if you want to create Perl-accessible functions that invoke C code but cannot or should not invoke the threading engine. XS is the standard means by which you wrap Perl-accessible C code. You can learn more at perlxs.
This function adds whatever string you pass to the XS BOOT section. The BOOT section is C code that gets called by Perl when your module is loaded and is useful for automatic initialization. You can learn more about XS and the BOOT section at perlxs.
Adds pure-C code to your XS file. XS files are structured such that pure C code must come before XS specifications. This allows you to specify such C code.
PDL normally checks the bounds of your accesses before making them. You can turn that on or off at runtime by setting MyPackage::set_boundscheck. This function allows you to remove that runtime flexibility and never do bounds checking. It also returns the current boundschecking status if called without any argumens.
NOTE: I have not found anything about bounds checking in other documentation. That needs to be addressed.
Many functions imported when you use PDL::PP allow you to modify the contents of the generated .pm file. In addition to pp_def and pp_done, the role of these functions is primarily to add code to various parts of your generated .pm file.
Adds Perl code to the generated .pm file. PDL::PP actually keeps track of three different sections of generated code: the Top, the Middle, and the Bottom. You can add Perl code to the Middle section using the one-argument form, where the argument is the Perl code you want to supply. In the two-argument form, the first argument is an anonymous hash with only one key that specifies where to put the second argument, which is the string that you want to add to the .pm file. The hash is one of these three:
{At => 'Top'} {At => 'Middle'} {At => 'Bot'}
For example:
pp_addpm({At => 'Bot'}, <<POD); =head1 Some documentation I know I'm typing this in the middle of my file, but it'll go at the bottom. =cut POD
Warning: If, in the middle of your .pd file, you put documentation meant for the bottom of your pod, you will thoroughly confuse CPAN. On the other hand, if in the middle of your .pd fil, you add some Perl code destined for the bottom or top of your .pm file, you only have yourself to confuse. :-)
Adds BEGIN-block wrapping. Certain declarations can be wrapped in BEGIN blocks, though the default behavior is to have no such wrapping.
Sets code to be added to the top of your .pm file, even above code that you
specify with pp_addpm({At => 'Top'}, ...)
. Unlike pp_addpm, calling
this overwrites whatever was there before. Generally, you probably shouldn't
use it.
When you get compile errors, either from your C-like code or your Perl code, it can help to make those errors back to the line numbers in the source file at which the error occurred.
Takes a line number and a (usually long) string of code. The line number
should indicate the line at which the quote begins. This is usually Perl's
__LINE__
literal, unless you are using heredocs, in which case it is
__LINE__ + 1
. The returned string has #line directives interspersed to
help the compiler report errors on the proper line.
PDL::PP usually exports all functions generated using pp_def, and usually installs them into the PDL symbol table. However, you can modify this behavior with these functions.
Sets the package (symbol table) to which the XS code is added. The default is PDL, which is generally what you want. If you use the default blessing and you create a function myfunc, then you can do the following:
$piddle->myfunc(<args>); PDL::myfunc($piddle, <args>);
On the other hand, if you bless your functions into another package, you cannot invoke them as PDL methods, and must invoke them as:
MyPackage::myfunc($piddle, <args>);
Of course, you could always use the PMFunc key to add your function to the PDL symbol table, but why do that?
Adds to the list of modules from which your module inherits. The default list is
qw(PDL::Exporter DynaLoader)
At the top of your generated .pm file is a line that looks like this:
use PDL::Core;
You can modify that by specifying a string to pp_core_importlist. For example,
pp_core_importlist('::Blarg');
will result in
use PDL::Core::Blarg;
You can use this, for example, to add a list of symbols to import from PDL::Core. For example:
pp_core_importlist(" ':Internal'");
will lead to the following use statement:
use PDL::Core ':Internal';
Sets your module's version. The version must be consistent between the .xs and the .pm file, and is used to ensure that your Perl's libraries do not suffer from version skew.
Adds to the export list whatever names you give it. Functions created using pp_def are automatically added to the list. This function is useful if you define any Perl functions using pp_addpm or pp_addxs that you want exported as well.
This resets the list of exported symbols to nothing. This is probably better
called pp_export_clear
, since you can add exported symbols after calling
pp_export_nothing
. When called just before calling pp_done, this ensures
that your module does not export anything, for example, if you only want
programmers to use your functions as methods.
PDL
For the concepts of threading and slicing check the PDL::Indexing manpage.
the PDL::BadValues manpage for information on bad values
perlxs, perlxstut
$RESIZE()
Although PDL::PP is quite flexible and thoroughly used, there are surely bugs. First amonth them: this documentation needs a thorough revision.
Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), Karl Glaazebrook (kgb@aaocbn1.aao.GOV.AU) and Christian Soeller (c.soeller@auckland.ac.nz) All rights reserved. Documentation updates Copyright(C) 2011 David Mertens (dcmertens.perl@gmail.com) This documentation is licensed under the same terms as Perl itself.