r
<- list(
interval_a start = 1,
end = 2,
label = "a"
)
<- list(
interval_b start = 3,
end = 4,
label = "b"
)
$fol <- interval_b interval_a
Josef Fruehwald
September 2, 2024
I’ve been doing a lot of python development recently, and really leaning into object oriented programming for my projects. For example, in the aligned-textgrid package, I started off by defining a class to represent intervals, which at their core have start and end times, and a label. Each interval also points to the interval objects preceding and following, as well as any intervals it contains or is contained in.
I don’t exactly want to replicate the entire package inside R, and for a while I wasn’t even sure how I could, because of R’s copy-on-modify behavior.
Let’s say we had the following sequence of intervals:
We could represent the basic information (start, end, and label) in lists. I’ll also set up a reference to interval_b
as following interval_a
.
r
We can double check that the fol
reference is working with identical()
.
identical()
This took me a while to find. Python has a commonly used is
operator for checking whether two variables refer to the same object, rather than just being equal.
But I’ve never used R’s identical()
before, and it doesn’t usually show up in intros to the language like python’s is
.
Things become a problem if we want to make changes to interval_b
, though. One of the core tasks I wanted aligned-textgrid to make easy is the modification of interval labels. But in R, if we change the value of interval_b$label
, that change won’t be reflected in the values in interval_a$fol
, and the reference between the two objects will be broken.
To understand what’s going on here, I’d recommend checking out Hadley Wickham’s Advanced R chapter on Names and Values, especially the section on Copy-on-modify.
One of these days, there’s going to be a new R package called yaoop
for Yet Another Object Oriented Paradigm. The class systems that ship with R are called S3
and S4
, and there are two new-ish class packages R6 and S7. These two new packages both have pretty interesting properties, but as pointed out in Advanced R
R6 objects are mutable, which means that they are modified in place, and hence have reference semantics.
S7 is the newer package, but I’ve double checked, and it also follows copy-on-modify, which means R6 is the way to go for this kind of use case.
Following the intro in the R6 documentation, the most basic SequenceInterval class would be something like this.
r
class()
.
public
argument.
initialize
is a special method that’s called when you use SequenceInterval$new()
SequenceInterval <- R6Class(
classname = "SequenceInterval",
public = list(
start = numeric(0),
end = numeric(0),
label = character(0),
prev = NULL,
fol = NULL,
initialize = function(
start = numeric(0),
end = numeric(0),
label = character(0)
){
self$start = start
self$end = end
self$label = label
}
)
)
My two sequence objects would then be:
r
Let’s double check that sequence_b
is appropriately following sequence_a
.
Now, for the moment of truth, we’ll change the label on interval_b
and see if it breaks things.
Success!
I’m going to try to make things a little more complicated with respect to thefol
and prev
properties. I want
When fol
is set, prev
is automatically set.
When prev
is set, fol
is automatically set.
These aren’t just nice quality of life features, but also capture the necessary logical properties of following and preceding.
I think the way to go about this will be
to lock off fol
and prev
from being directly settable. I think the best way to do this is to move them to active bindings, which seems a lot like using the python @property
decorator on a method.
Add private .fol
and .prev
properties.
Define setter functions that will update the _fol
and _prev
properties, being careful to avoid infinite recursion!
I’ve described each new piece in the code annotations.
r
.fol
and .prev
objects.
set_fol()
method is calling set_prev()
, and the set_prev()
method is calling set_fol()
. To avoid infinite recursion, the function should stop here if it’s already the preceding interval to its following interval.
set_prev()
method here, because the method won’t be able to dig into the private .prev
property of interval
.
set_fol()
.
SequenceInterval <- R6Class(
classname = "SequenceInterval",
private = list(
.fol = NULL,
.prev = NULL
),
active = list(
fol = function(){
return(private$.fol)
},
prev = function(){
return(private$.prev)
}
),
public = list(
start = numeric(0),
end = numeric(0),
label = character(0),
initialize = function(
start = numeric(0),
end = numeric(0),
label = character(0)
){
self$start = start
self$end = end
self$label = label
},
set_fol = function(interval){
private$.fol = interval
if(identical(self$fol$prev, self)){
return(invisible(self))
}
self$fol$set_prev(self)
},
set_prev = function(interval){
private$.prev = interval
if(identical(self$prev$fol, self)){
return(invisible(self))
}
self$prev$set_fol(self)
}
)
)
We can still create intervals the same way as above.
But to set up interval_b
as the interval following interval_a
, you’ve got to use the set_fol()
setter function.
But now, interval_a
has been automatically set as preceding interval_b
!
The major defect here is that if I, or a user, didn’t know about the setter functions, we’ll get a very inscrutable error message when trying to directly set fol
or prev
.
I might want to figure out if it’s possible to get a better error here, or even better, some way to short circuit this assignment attempt to set_fol()
, but I think that’s a bit beyond my patience and time for right now.
@online{fruehwald2024,
author = {Fruehwald, Josef},
title = {Experimenting with {Object} {Oriented} {Programming} in {R}},
series = {Væl Space},
date = {2024-09-02},
url = {https://jofrhwld.github.io/blog/posts/2024/09/2024-09-02_OOP-r/},
langid = {en}
}