Reactive Streams and Variables

Within reactive systems, several different terms are often used for the basic entity of a time-varying value within a data flow. These terms can imply or carry slightly different things meanings:

Stream – A stream suggests a flow of values, where each value is significant. The term “stream” has primarily been used for IO operations where we want to have a model for data, such that we can operate on it, as it is received. However, applying this term to represent the state of object that may change, is usually inappropriate, as it implies the wrong type of concerns about stateful data. A stream is a great model for IO operations, but within reactive programming, it suggests “push” reactivity which tends to suffer from inefficiency when applied to modeling state.

“Event Bus” and “Signals” are other terms that may suggest a similar concept as a stream. A signal may be a little vaguer, and more open to interpretation.

Observable – This term is used to indicate that an object’s changes can be observed. This term accurately reflects what we can do with an object, but this also why the term falls short. Rather than describing what the object is and describing it’s inherent nature (that it can vary), we are only describing one particular way that we can interact with it. This is a short-sighted description because the nature of being time-varying may give opportunity to multiple ways of interacting with an object. If it is time-varying object, not only can we observe it, we might be able to map it, modify it, or discover meta-data about how it may vary.

Variable – I have been increasingly using this term to describe the basic entity of time-varying value, because “variable” accurately conveys that it is representing something that may change or vary. One of the key aspects of an efficient reactive system is being able to lazily respond to a group of changes, rather than each intermediate step (for example, if you are incrementally changing properties on an object). A variable conveys that a value may change, but that we are not necessarily interested in every intermediate value, like with a stream. The term “variable” fits well with more efficient “pull” reactivity.

Advertisement

Granular and Coarse Reactivity

One way to categorize reactive JavaScript programming paradigms is by granularity: fine/granular and coarse reactivity. There are pros and cons to each approach, and it is worth considering these approaches to decide on which may work better for your application, or even how you might integrate the two approaches.

Granular reactivity is an approach where each different pieces of time-varying state is stored in different streams, or variables. We can create many different reactivity variable to contain the various parts of application state that may change. These various variables are then used to explicitly control different parts of the UI that are dependent on this state. This approach is facilitated by libraries like BaconJS, RxJS, and Alkali.

Course reactivity takes a more bundled approach. All of the state that affects a given component is bundled into a single state object or container. When state changes, rather than discerning the separate aspects of each state change, an entire component is re-rendered. By itself, this would be tremendously inefficient, as you would have a large scale DOM rewrite on every state change. Consequently, this approach is almost always combined with a diffing mechanism, which involves virtual DOM (an in-memory copy of a DOM), that can compare the results of the rendered output to compute exactly what DOM elements need to be changed, and minimize DOM modifications. This approach is most famously implemented in the React library.

Again, there are certainly advantages and disadvantages of each approach. Course reactivity has gained tremendous popularity because it facilitates the creation of reactive UIs, while still allowing developers to use familiar imperative style coding. This style, sometimes combined with syntactic additions (JSX), provides a terse syntax for writing components, with familiar style and elements.

There are distinct disadvantages though, as well. By avoiding any explicit connection between indvidual components of state, and their resulting output, reactive changes result in large scale rewrites of component output. Again, to cope with this, coarse reactivity employs diffing algorithms to precisely determine what has changed and what hasn’t. However, this comes with a price. In order for the diffing to work properly it must maintain a virtual copy of the DOM so it can compare changes before and after reactive changes to determine what DOM elements to actually update. This virtual copy consumes a substantial amount of memory, often require several times as much memory as the corresponding DOM alone.

Memory consumption is often trivialized (“memory is cheap”), and easily ignored since it doesn’t usually affect isolated performance tests. However, high memory usage is a negative externality, there are many layers of caching that it slows down, it incrementally drags down not only itself, but other code in the system. This is similar to real-world externalities like pollution; in isolation it is not worth dealing with, but collectively, when everything is consuming high amounts of memory, the effects become more visible.

The imperative nature of coarse reactivity also complicates efforts to create two-way bindings. Because the relationship between data sources and their output is obscured by the imperative process, there is no natural connection that relates input changes back to their sources. Of course, there are ways to set up bindings back to data, but this doesn’t come for free.

On the other hand, using a more pure, granular approach to reactivity, we can overcome these issues. This has challenges as well. The most obvious drawback of granular reactivity it requires a more functional approach, rather than the familiar imperative approach. However, this can also be considered a benefit. Granular reactivity pushes developers towards using more functional paradigms. Functional programming has substantial benefits. While these benefits can be explored in far more depth, briefly it leads to a more direct way of describing “what” is happening in an application, rather than working in terms of “how” a sequence of state changes will accomplish a task. This is particularly important as we read and maintain code. While we often think about coding in terms of writing new code, most of us spend much more time reading and updating existing. And being able comprehend code without have to replay state sequences is a huge boost for comprehension.

And granular reactivity certainly provides a more memory efficient approach. Only those elements that are bound to data or variables have extra memory, and only the memory necessary to track their relationship. This means that static portions of an application, without any data binding or backing variable, do not require any extra memory elements (in a virtual DOM, or otherwise). In my work with a highly memory intensive application, this has been a tremendous benefit.

Granular reactivity also creates direct relationships between state and UI component values. This greatly simplifies two-way data bindings, because UI inputs can be directly mapped back to their source. This two-way connection can automatically be provided by a good granular reactive system.

Finally, granular reactivity can more easily be incrementally (and minimally) added to existing components and UI applications. Rather than having to convert whole components to using virtual DOM output, individual reactive elements can be incrementally added to components with very minimal changes.

While these are contrasting approaches, it is certainly feasible to combine them in a a single application. One can use coarse state components (with a virtual DOM) for portions of the application that are may not incur as much memory overhead (perhaps due to low frequency of usage), while areas with more heavy usage, or that will benefit from more automated two-bindings, can use granular reactivity approaches. Either way, it is important recognize the pros and cons of these approaches. While coarse grain may be quick and easy, deeper concerns may suggest the more efficient and functional approach of granular reactivity.

Rendering Efficiently with Two Phase UI Updates

A responsive user interface is a critical aspect of a good user experience, and maintaining responsiveness with increasingly complex applications, with growing data can be very challenging. In this post, I want to look at how to follow the practice of using two distinct UI update phases to achieve optimal performance.

Several months ago I started a new position with Doctor Evidence, and one of our primary challenges is dealing with very large sets of clinical data and providing a responsive UI. This has given me some great opportunities and experience with working on performance with complex UI requirements. I have also been able to build out a library, Alkali, that facilitates this approach.

Two phase UI updates basically means that we have one phase that is event-triggered and focused on updating the canonical state of our application, and determining which dependent UI dependencies as invalid, and in need of rerendering. We then have a second phase that is focused on rerendering any UI elements that have been marked as invalid, and computing any derived values in the process.
two phase rendering

State Update Phase

The state update phase is a response to application events. This may be triggered by user interaction like clicks, incoming data from a server, or any other type of event. In response to this, the primary concern, as the name suggests, is to update the canonical state of the application. This state may exist at various levels, it could be persisted data models that are synchronized and connected to a server, or it could be higher level “view-models” that are often more explicitly called out in MVVM design and may reflect more transient state. However, the important concept is that we focus on only updating “canonical” state, that is, state information that is not derived from other sources. Derived data should generally not be directly updated at this stage.

The other action that should take place during the state update phase is to invalidate any components that are dependent on the state that is updated. Naturally, the state or data model components should not be responsible for knowledge of which and how to update downstream components, but rather should allow for dependent components to able to register for when a state update has occurred so they can self-invalidate. But the key goal of the invalidation part of the state update phase is to not actually re-render anything immediately, but simply mark what needs to be rendering for the next phase.

This purpose of this approach is that complicated or repetitive state changes will never trigger excessive rendering updates. During state updates, downstream dependent components can be marked as invalid over and over and it will still only need a single rendering in the next phase.

The invalidation should also trigger or queue the rendering actions for the next phase. Fortunately, browsers provide the perfect API for doing exactly this. All rendering should be queued using requestAnimationFrame. This API is already optimized to perform the rendering at the right time, right before the browser lays out and paints, and deferred when necessary to avoid excessive render frequency in the case of rapid events. Fortuneately, much of the complexity of the right timing for rendering is handled by the browser with this API. We just need to make sure our components only queue a single rendering in response to invalidation (multiple invalidations in one phase shouldn’t queue up multiple renderings).

Rendering Phase

The next phase is the rendering phase. At this point, there should be a set of rendering actions that have been queued. If you have used requestAnimationFrame, this phase will be initiated by the browser calling the registered callbacks. As visual components render themselves, they retrieve data from data models, or data model derivatives that are computed. It is at this point that any computed or derived data, that had been invalidated by the previous phase, should now be calculated. This can be performed lazily, in response to requests from the UI components. On-demand computations can avoid any unnecessary calculations, if any computed data is no longer needed.

The key focus in this phase is that we are rendering everything has been invalidated, and computing any data along the way. We must ensure that this phase avoids any updates to canonical data sources during this phase. By separating these phases and avoiding any state updates during rendering, we can eliminate any echo or bouncing of state updates that can slow a UI down or create difficult to debug infinite loops. Batching DOM updates into a single event turn is also an important optimization for avoiding unnecessary changes browser level layout and repaint.

In fact, this two phase system is actually very similar to the way browsers themselves work. Browsers process events, allowing JavaScript to update the DOM model, which is effectively the browser’s data model, and then once the JS phase (state updating) is complete, it enters into its own rendering phase where, based on changed DOM nodes, any relayout or repainting is performed. This allows a browser to function very efficiently for a wide variety of web applications.

An additional optimization is to check that invalidated UI components are actually visible before rerendering them. If a UI component with changed dependent values is not visible, we can simply mark it as needing to be rerendered, and wait until if and when it is visible to do rendering. This can also avoid unnecessary rendering updates.

Alkali

This approach is generally applicable, and can be used with various libaries and applications. However, I have been building, and using a reactive package, alkali, which is specifically designed to facilitate this efficient two phase model. In alkali, there is an API for a Variable that can hold state information, and dependent variables can be registered with it. One can easily map one or multiple variable to another, and alkali creates the dependency links. When a source variable is updated, the downstream dependent variables and components are then automatically invalidated. Alkali handles the invalidation aspect of state updates, as part of its reactive capabilities.

Alkali also includes an Updater class. This is a class that can be used by UI endpoints or components that are dependent on these variables (whether they are canonical or computed). An Updater is simply given a variable, element, and code to perform a rendering when necessary. The Updater will register itself has a dependency of the given variable, and then will be invalidated if any upstream changes are made, and queue its re-rendering via requestAnimationFrame. The rendering code will then be called, as needed, in the proper rendering phase, at the right time.

With alkali, computed variables are also lazy. They will only be computed as needed, on-demand by downstream UI Updaters. They are also cached, so they are only computed one time, to avoid any unnecessary repetition of computations.

As we have tackled large scale data models in a complex UI, the two phase approach was helped us to manage complex state changes, and provide a fast, responsive efficient UI rendering of our data model as state rapidly changes.

Reactivity and Caching

Functionally reactive programming techniques have seen increasing usage lately, as a means of simplifying code, particularly in situations where views can react to model data. Reactive programming can replace a lot of error-prone imperative code dedicated to responding to data changes, allowing synchronization of views and data to be written more precisely, without redundancy of intent.

As I have been working on reactive programming in xstyle, I believe that reactivity is actually a natural extension of what I like to call resource oriented programming. Resource oriented programming is that act of applying RESTful concepts to programming, so that resources can be treated with a uniform interface. Reactivity fits well within this concept because it is intended to provide a uniform interface for responding to changes in resources, using the observer pattern. With this uniform approach, bindings can easily and naturally be written.

Viewed from this perspective, REST suggests a surprisingly insightful mechanism for reactivity. A key mechanism in REST is caching, the semantics of resource interfaces are driven by exposing how resources can be cached. This resource-oriented approach can guide reactivity, but first, let’s review the mechanism of reactivity.

Generally, reactivity is built on an interface for adding a listener to be notified when a particular value changes, often called the observer pattern. This is sometimes compared to a continuous promise. A promise is something that be resolved and any listeners notified. A reactive, observable value is similar, except that the rather having a single resolution, it can continue to resolve to different values indefinitely. Alternately, reactivity is sometimes compared to a stream, with new values being streamed to a listener as they come in (there is some impedance mismatch with streams though, since reactivity differs in that listeners may not care about the entire sequence, only the last value).

Perhaps the most obvious strategy for these change notifications in the observer pattern, is to include the changed value in the notification. With this approach, when a change occurs, the new values are propagated out to any listeners, as part of the notification. A consumer would retrieve an initial value, and then listen for changes and respond to them.

The REST-inspired resource-oriented approach still uses the observer pattern to alert consumers of changes. However, with the REST approach, the notifications and flow of data are delineated. Change events function as simply invalidation notifications, and the flow of data always takes place as a separate request from the consumer to the source.

Invalidation notifications complements caching. Caching of values (particularly computed values) can be employed to ensure that no unnecessary computations take place, and any entity can perform caching safely with the knowledge that we will be notified if any of the source data is invalidated.

The REST approach of keeping notifications (only responsible for indicating a change) separate from a cacheable data request mechanism, rather than using notifications as the data delivery mechanism has several important advantages:

  • Consumers always follow the same code path to get data and act on it. Regardless of whether consumption (like rendering data) is on the initial retrieval, or whether it is triggered from a change notification, the consumer always requests and retrieves data in the same, consistent way.
  • Consumers have control over whether or not to actually request data in response to a notification. There can be plenty of situations where a consumer may temporarily or permanently not need to access new data when notified of its presence. For example, a component that is currently hidden (think of an object in pane of an unselected tab), may not need to force all the computations necessary to retrieve the data, if it doesn’t actually need to render it. The notification of a change may be important to be aware that it will need to rerender if and when it is visible again, but it could potentially ignore multiple data changes while hidden, by simply not retrieving data until necessary. This approach affords maximum laziness in applications.
  • When consumers retrieve data, it produces more coherent call stacks. Call stacks that consist of data sources calling intermediaries calling consumer components can be incredibly confusing to track. While data sources can still be at the bottom of the stack with their notifications, the actual data flow will always be ordered by consumers calling intermediaries calling data sources, for consistent, intuitive debugging.

Interestingly, EcmaScript’s new Object.observe functionality, which provides native object notifications, actually follows this same approach. The notification events that are triggered by property changes, are only invalidation notifications, that do not include the changed value. This aligns with the resource-oriented approach I have described, and encourages this strategy.

There are certainly situations where this approach is not appropriate. In particular, this isn’t appropriate when there are large latency increases associated with separating notification and retrieval operations (for example, if they necessitate extra network requests). However, within a single process, I believe this resource-oriented cache-and-invalidate approach can yield cleaner and more efficient code, and is worth considering as you approach reactive programming.