Conditions in PyTables are used in methods related with in-kernel
and indexed searches such as Table.where()
(see
description) or Table.readWhere()
(see description). They are interpreted using
Numexpr, a powerful package for achieving C-speed computation of array
operations (see [11]).
A condition on a table is just a string containing a Python expression involving at least one column, and maybe some constants and external variables, all combined with algebraic operators and functions. The result of a valid condition is always a boolean array of the same length as the table, where the i-th element is true if the value of the expression on the i-th row of the table evaluates to true [16]. Usually, a method using a condition will only consider the rows where the boolean result is true.
For instance, the condition 'sqrt(x*x + y*y) <
1'
applied on a table with x
and
y
columns consisting of floating point numbers
results in a boolean array where the i-th element
is true if (unsurprisingly) the value of the square root of the sum of
squares of x
and y
is less than 1.
The sqrt()
function works element-wise, the 1
constant is adequately broadcast to an array of ones of the length of
the table for evaluation, and the less than
operator makes the result a valid boolean array. A condition like
'mycolumn'
alone will not usually be valid, unless
mycolumn
is itself a column of scalar, boolean
values.
In the previous conditions, mycolumn
,
x
and y
are examples of
variables which are associated with columns.
Methods supporting conditions do usually provide their own ways of
binding variable names to columns and other values. You can read the
documentation of Table.where()
(see description) for more information on that. Also, please
note that the names None
, True
and
False
, besides the names of functions (see below)
can not be overridden, but you can always define
other new names for the objects you intend to use.
Values in a condition may have the following types:
8-bit boolean (bool
).
32-bit signed integer (int
).
64-bit signed integer (long
).
32-bit, single-precision floating point number
(float
or float32
).
64-bit, double-precision floating point number
(double
or float64
).
2x64-bit, double-precision complex number
(complex
).
Raw string of bytes (str
).
Nevertheless, if the type passed is not among the above ones, it will be silently upcasted, so you don't need to worry too much about passing supported types: just pass whatever type you want and the interpreter will take care of it.
However, the types in PyTables conditions are somewhat stricter
than those of Python. For instance, the only valid
constants for booleans are True
and
False
, and they are never
automatically cast to integers. The type strengthening also affects the
availability of operators and functions. Beyond that, the usual type
inference rules apply.
Conditions support the set of operators listed below:
Logical operators: &, |, ~.
Comparison operators: <, <=, ==, !=, >=, >.
Unary arithmetic operators: -.
Binary arithmetic operators: +, -, *, /, **, %.
Types do not support all operators. Boolean values
only support logical and strict (in)equality comparison operators, while
strings only support comparisons, numbers do not work with logical
operators, and complex comparisons can only check for strict
(in)equality. Unsupported operations (including invalid castings) raise
NotImplementedError
exceptions.
You may have noticed the special meaning of the usually bitwise
operators &
, |
and
~
. Because of the way Python handles the
short-circuiting of logical operators and the truth values of their
operands, conditions must use the bitwise operator equivalents instead.
This is not difficult to remember, but you must be careful because
bitwise operators have a higher precedence than
logical operators. For instance, 'a and b == c'
(a
is true AND b
is
equal to c
) is not
equivalent to 'a & b == c'
(a
AND b
is equal to
c
). The safest way to avoid confusions is
to use parentheses around logical operators, like
this: 'a & (b == c)'
. Another effect of
short-circuiting is that expressions like '0 < x <
1'
will not work as expected; you should
use '(0 < x) & (x < 1)'
[17]
You can also use the following functions in conditions:
where(bool, number1, number2): number
—
number1
if the bool
condition is true, number2
otherwise.
{sin,cos,tan}(float|complex):
float|complex
— trigonometric sine, cosine or
tangent.
{arcsin,arccos,arctan}(float|complex):
float|complex
— trigonometric inverse sine, cosine or
tangent.
arctan2(float1, float2): float
—
trigonometric inverse tangent of
float1/float2
.
{sinh,cosh,tanh}(float|complex):
float|complex
— hyperbolic sine, cosine or
tangent.
{arcsinh,arccosh,arctanh}(float|complex):
float|complex
— hyperbolic inverse sine, cosine or
tangent.
{log,log10,log1p}(float|complex):
float|complex
— natural, base-10
and log(1+x)
logarithms.
{exp,expm1}(float|complex):
float|complex
— exponential and exponential minus
one.
sqrt(float|complex): float|complex
—
square root.
{abs}(float|complex): float|complex
—
absolute value.
{real,imag}(complex): float
— real or
imaginary part of complex.
complex(float, float): complex
— complex
from real and imaginary parts.
[16] That is the reason why multidimensional fields in a table are not supported in conditions, since the truth value of each resulting multidimensional boolean value is not obvious.
[17] All of this may be solved if Python supported overloadable
boolean operators (see PEP 335) or some kind of non-shortcircuiting
boolean operators (like C's &&
,
||
and !
).