This introduction aims at making a gentle start to rpy2, either when coming from R to Python/rpy2, from Python to rpy2/R, or from elsewhere to Python/rpy2/R.
It is assumed here that the rpy2 package was properly installed. In python, making a package or module available is achieved by importing it.
import rpy2.robjects as robjects
The object r in rpy2.robjects represents the running embedded R process.
If familiar with R and the R console, r is a little like a communication channel from Python to R.
In Python the [ operator is an alias for the ethod __getitem__().
With rpy2.robjects, the method __getitem__() functions like calling a variable from the R console.
Example in R:
pi
With rpy2:
>>> robjects.r['pi']
3.14159265358979
Note
Under the hood, the variable pi is gotten by default from the R base package, unless an other variable with the name pi was created in the globalEnv. The Section Environments tells more about that.
The r object is also callable, and the string passed to it evaluated as R code.
This can be used to get variables, and provide an alternative to the method presented above.
Example in R:
pi
With rpy2:
>>> robjects.r('pi')
3.14159265358979
Warning
The result is an R vector. Reading Section R vectors is recommended as it will provide explanations for the following behavior:
>>> robjects.r('pi') + 2
c(3.14159265358979, 2)
>>> robjects.r('pi')[0] + 2
5.1415926535897931
The evaluation is performed in what is known to R users as the Global Environment, that is the place one starts at when starting the R console. Whenever the R code creates variables, those variables will be “located” in that Global Environment by default.
Example:
robjects.r('''
f <- function(r) { 2 * pi * r }
f(3)
''')
The expression above will return the value 18.85, but first also creates an R function f. That function f is present in the R Global Environement, and can be accessed with the __getitem__ mechanism outlined above:
>>> robjects.globalEnv['f']
function (r)
{
2 * pi * r
}
or
>>> robjects.r['f']
function (r)
{
2 * pi * r
}
Against the first impression one may get from the title of this section, simple and handy features of rpy2 are presented here.
An R object has a string representation that can be used directly into R code to be evaluated.
Simple example:
>>> letters = robjects.r['letters']
>>> rcode = 'paste(%s, collapse="-")' %(letters.r_repr())
>>> robjects.r(rcode)
"a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"
In R, data are mostly represented by vectors, even when looking like scalars.
When looking closely at the R object pi used previously, we can observe that this is in fact a vector of length 1.
>>> len(robjects.r['pi'])
1
As such, the python method add() will result in a concatenation (function c() in R), as this is the case for regular python lists.
Accessing the one value in that vector will have to be stated explicitly:
>>> robjects.r['pi'][0]
3.1415926535897931
There much that can be achieved with vector, having them to behave more like Python lists or R vectors. A comprehensive description of the behavior of vectors is found in Vectors.
Creating R vectors can be achieved simply:
>>> robjects.StrVector(['abc', 'def'])
c("abc", "def")
>>> robjects.IntVector([1, 2, 3])
1:3
>>> robjects.FloatVector([1.1, 2.2, 3.3])
c(1.1, 2.2, 3.3)
R matrixes and arrays are just vectors with a dim attribute.
The easiest way to create such objects is to do it through R functions:
>>> v = robjects.FloatVector([1.1, 2.2, 3.3, 4.4, 5.5, 6.6])
>>> m = robjects.r['matrix'](v, nrow = 2)
>>> print(m)
[,1] [,2] [,3]
[1,] 1.1 3.3 5.5
[2,] 2.2 4.4 6.6
Calling R functions will be disappointingly similar to calling Python functions:
>>> rsum = robjects.r['sum']
>>> rsum(robjects.IntVector([1,2,3]))
6L
Keywords can be used with the same ease:
>>> rsort = robjects.r['sort']
>>> rsort(robjects.IntVector([1,2,3]), decreasing=True)
c(3L, 2L, 1L)
Note
By default, calling R functions will return R objects.
More information on functions is in Section Functions.
This section demonstrates some of the features of rpy2 by the example.
import rpy2.robjects as robjects
r = robjects.r
x = robjects.IntVector(range(10))
y = r.rnorm(10)
r.X11()
r.layout(r.matrix(robjects.IntVector([1,2,3,2]), nrow=2, ncol=2))
r.plot(r.runif(10), y, xlab="runif", ylab="foo/bar", col="red")
Setting dynamically the number of arguments in a function call can be done the usual way in python
args = [x, y]
kwargs = {'ylab':"foo/bar", 'type':"b", 'col':"blue", 'log':"x"}
r.plot(*args, **kwargs)
Note
Since the named parameters are a Python dict, the order of the parameters is lost for **kwargs arguments.
The R code is:
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
anova(lm.D9 <- lm(weight ~ group))
summary(lm.D90 <- lm(weight ~ group - 1))# omitting intercept
One way to achieve the same with rpy2.robjects is
import rpy2.robjects as robjects
r = robjects.r
ctl = robjects.FloatVector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14])
trt = robjects.FloatVector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69])
group = r.gl(2, 10, 20, labels = ["Ctl","Trt"])
weight = ctl + trt
robjects.globalEnv["weight"] = weight
robjects.globalEnv["group"] = group
lm_D9 = r.lm("weight ~ group")
print(r.anova(lm_D9))
lm_D90 = r.lm("weight ~ group - 1")
print(r.summary(lm_D90))
When taking the results from the code above, one could go like:
>>> print(lm_D9.rclass)
[1] "lm"
Here the resulting object is a list structure, as either inspecting the data structure or reading the R man pages for lm would tell us. Checking its element names is then trivial:
>>> print(lm_D9.names)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "contrasts" "xlevels" "call" "terms"
[13] "model"
And so is extracting a particular element:
>>> print(lm_D9.r['coefficients'])
$coefficients
(Intercept) groupTrt
5.032 -0.371
More about extracting elements from vectors is available at Indexing.
The R code is
m <- matrix(rnorm(100), ncol=5)
pca <- princomp(m)
plot(pca, main="Eigen values")
biplot(pca, main="biplot")
The rpy2.robjects code is
import rpy2.robjects as robjects
r = robjects.r
m = r.matrix(r.rnorm(100), ncol=5)
pca = r.princomp(m)
r.plot(pca, main="Eigen values")
r.biplot(pca, main="biplot")