Developing a complex R Shiny app – the good, the bad and the ugly

Together with Clara Bicalho (UC Berkeley) and Sisi Huang (WZB), I recently developed a web application that acts as a convenient interface to the DeclareDesign R package and its repository of research designs, DesignLibrary. This web application, which we called DeclareDesign Wizard, allows users to investigate and customize research designs in their web browser. We used R Shiny for implementing it and since this was my first large Shiny project, I wanted to reflect a bit on the development process and tell in which parts Shiny shone, and in which it didn’t.

Project background – DeclareDesign and the “Wizard” app

At first I’d like to give a little background information regarding the project. Our app is built for (and with) the DeclareDesign (DD) framework. The purpose of DD is to allow researchers to describe their research design in R code, run simulations through this design definition and assess properties of design estimators, e.g. their power, bias or other diagnosands. By using R and a set of “design building steps”, DD is a pretty versatile tool – however it also requires the researcher to be familiar with R. This is where our DeclareDesign Wizard app steps in. There are several well established research designs and the DD folks implemented them in the DesignLibrary as parametrized designs. This means you can select a design, e.g. a Two-Arm Experiment, set its parameters (e.g. sample size N or standard deviation in the treatment group treatment_sd) and investigate properties like RMSE or power by simulating the design.

The DeclareDesign Wizard allows you to select and customize the designs from the DesignLibrary using a convenient web interface. In a second step you can interrogate (or diagnose) the design by running simulations for a range of different parameter values and plot the results. For example, you can vary N and treatment_sd and directly see how they affect a diagnosand like RMSE for a particular estimator (fig. 1).

Fig. 1: Diagnosing a Two-Arm research design for different parameter sets.

For more on the Wizard app, see this blog post. You can try out the Wizard yourself and send us feedback if you like, or report an issue on GitHub if you encounter any.

For the rest of this article, I’ll focus on my experiences with Shiny during the development of the app.

The good

The project started out with quick prototyping for a single design. Shiny is just great for that, as you can create an interactive frontend for existing R code very easily. However, things quickly become complicated when you create dynamic user interfaces (UIs). In our case, most of the UI is created on-the-fly. When a research design is loaded from the library, the properties of that design (e.g. the parameters it accepts, the diagnosands that are available for it, etc.) determine the input fields that are created. It was not a choice for us to predefine UI elements for each design and selectively display them, because of the large number of designs with very specific parameter sets. For example, a Two-Arm design accepts parameters for sample size, probability of assignment to treatment and more, while in a Block Cluster Two-Arm design you may specify the number of blocks, number of clusters in each block and number of units in each cluster. Some parameters accept only integers, others any real numbers and yet others accept vectors of numbers (e.g. the Factorial design). Creating the UI elements on-the-fly also makes the app easier to extend so that you can easily add new designs.

While constructing such a dynamic UI is considerably more complicated, it can still be implemented efficiently with Shiny, because all input elements like numericInput, textInput, etc. can be created dynamically from the code. However, there are several challenges that come with this approach: First, the whole asynchronous nature of the Shiny frontend make the “timeline” of when certain actions are executed unpredictable, especially when you use a dynamic UI. For example, we had to implement a hack via JavaScript that would indicate that the app is currently switching from one design to another design and that this new design’s UI is not fully loaded yet. This was the only way to prevent that some part of the code would prematurely load values from the “old” UI (with values from the former design) and try to apply it to the new design, which would of course fail. Furthermore, the additional complexity introduces problems with testing and debugging which I will address later on. The upcoming book “Mastering Shiny” (H. Wickham, announced for late 2020) has a section on how to create UI elements dynamically with code and I think this is an important topic for many Shiny developers.

We started development as a single-file Shiny app with growingly large ui and server code. It was quickly clear that we somehow needed to restructure the code to keep it maintainable. Luckily, Shiny provides modules and namespaces, which I think are obligatory for complex apps. We created two modules – the “design” and the “diagnose” (internally: “inspect”) module, each representing a workflow phase and a UI tab in the Wizard app and each with its separate source code file.

In general, the Shiny modularization concept works very well although implementing communication between modules is challenging initially and there are a few quirks that you just have to keep in mind while programming. For example, you have to remember to always declare UI elements with namespaces in the UI code, but most of the time you don’t use namespaces in the server code (i.e. when you access an input value via input$...). Notable exceptions are updating values (e.g. using updateTextInput()), dynamically creating new UI elements and also addressing UI elements with shinyjs.

Another very useful feature that saved us a lot of time, because we didn’t have to implement it ourselves, is the bookmarking feature. This allowed us to quickly implement a “share” button. By this, users can load and adjust a research design and quickly share it with colleagues. The bookmarking feature makes sure that the state of the app, as it is currently displayed in the browser, will be restored whenever you visit the shared link. Our heavy use of dynamic UI elements and custom state objects made implementing this feature not as straight forward as it is shown in the documentation (because of difficulties related to the already mentioned asynchronous nature of Shiny apps), but overall it worked quite well.

Another big positive aspect of the Shiny ecosystem is the extensive and well written documentation and the large community around it that provides support and extension libraries. We use several community-developed libraries such as shinyjs, shinymaterial or shinyBS.

The bad

“Debugging Shiny applications can be challenging.” These are the first words from the Shiny documentation on debugging which is an absolute must-read for anyone creating more than prototypes or toy examples with Shiny. These words are very true indeed, as I experienced on my own.

First of all, I was used to develop web applications with frameworks like Django (a Python web framework) and an IDE like PyCharm and expected the debugger to work as usual: You set a breakpoint somewhere in the code via your IDE’s interface. You run the code in debug mode. The debugger enters at the breakpoint when this code line is about to be executed and you can investigate the situation.

Not so with Shiny tough. As the mentioned documentation says: “For technical reasons, breakpoints can only be used inside the shinyServer function. You can’t use them in code in other .R files.” So since we placed our code in several .R files, this way of debugging is impossible for us unless we want to debug something in the central app.R file. Fortunately, there is rescue for this situation: You can use browser() instead wherever you would place a breakpoint and this will make the debugger enter at that spot, no matter in which file you’re using it. However, you should always remember to remove the browser() line when you commit your code to a repository.

Before I knew about browser(), I used to write print() statements instead, which felt like an anachronism and reminded me of the days when I first got to know programming. I still have to resort to that anachronism from time to time tough, because you don’t always want to pause program execution as browser() does.

There are also more debugging tools available for Shiny: Showcase mode highlights the parts of your code that are executed while you’re interacting with your app. Unfortunately the highlighting feature crashed RStudio when trying it out with the Wizard app on my computer. Anyway I believe it’s not that useful since you don’t actually see the values of your variables during runtime.

Finally, there is reactlog which records and visualizes the reactive states of a Shiny app as a dynamic dependency graph. I have to admit that I tried it out only briefly as I was already flashed by the complexity of the graph that was generated when only clicking the “Load design” button of our app (fig. 2). I also couldn’t find a way to zoom into the graph other than randomly selecting a node (randomly because the labels are unreadable) which then shows a partial graph. I think it is probably a very useful tool as it can help to understand the very complex reactivity structure of a Shiny app that often feels like a “black box”, but the tool itself requires a considerable amount of training.

Fig. 2: Output of reactlog after clicking “Load design.” Nope, I couldn’t identify the labels neither.

So there are several approaches for debugging Shiny apps, but I’m not fully satisfied with any of them. No matter which debugging technique I use, the most frustration for me comes with having to manually “replay” several steps in the app, e.g. clicking certain buttons, entering certain values, etc., in order to reach a state that I want to debug. You can’t simply run a script as you’d do with ordinary R code. I haven’t found a tool to assist with this problem so far.

The ugly

Automated testing is essential when developing anything else but prototype software. It makes sure you don’t re-introduce bugs you already fixed, or that your app breaks when you upgrade a dependency. It speeds up development because you don’t have to click the same things over and over again to check if your app works as expected after you changed some code.

So for a complex Shiny app like ours, we needed automated tests too. For testing R code, you can usually use testthat or RUnit. However, when developing a Shiny app this won’t get you very far. With this, you won’t be able to test the actual interactive app, but only certain (helper) functions that are mostly independent from Shiny and interactive inputs.

This is where the shinytest package comes in handy (at least theoretically). It works quite differently as compared to traditional testing approaches: With shinytest you record a certain state of the app (from which you know that it works as expected) while interacting with it. This state is saved to disk (by the means of screenshots and recorded input / output data) and can be “replayed” anytime. Once you replay it (e.g. after changing some code), the app receives the same inputs (i.e. clicks on the same buttons, same input values, etc.) and shinytest compares whether the output (i.e. the screenshots and output data) has changed. Of course it also notices when an exception occurs while “replay”.

Apart from obvious limitations (e.g. you have to record all tests again once you make changes to the UI because the screenshots change) this could be a really helpful tool. Unfortunately, we just couldn’t get it running because of an already known bug in shinytest. So, at least for us, automated testing is so far only partially available (we implemented some RUnit tests for helper functions), which slows down development and is quite frustrating, because you have to do several super repetitive testing tasks by hand.


All in all, Shiny is an extremely helpful software package that doesn’t only help to communicate research results or facts and concepts in science (for example as a tool for teaching). As shown with our application, it may also help to bridge the gap between an R and non-R audience for your existing R package. People not familiar with R may now also use parts of DeclareDesign to describe, customize and interrogate research designs. Since our app also provides generated R code for creating the specified design and plotting its diagnosis results, the “non-R audience” gets a first impression and motivation to adjust the R code and maybe even a motivation to learn R.

The big advantage of using Shiny in our case was that we could directly use the DeclareDesign R package to build a web interface for it without any additional layer and even a different programming language in between. There may be more advanced interactive web frameworks which may be faster, have fewer problems regarding automated testing and debugging and feel less like a “black box”. However, if we had for example used a React.js frontend and a Django REST backend, we would’ve needed to interact with R from Python and used at least three different programming languages and I doubt that debugging would be easier in this case.

The things that make developing Shiny applications frustrating at times are the lack of a good automated testing tool, the overwhelmingly complex and dynamic relationships between several input and output components for large applications and the challenging debugging procedure that come with this complexity. I think it’s in the nature of any interactive system that it’s hard to debug, so it doesn’t surprise that Shiny struggles with this, too. However, a first step would be to more clearly communicate that debugging with breakpoints doesn’t work with Shiny apps (unless you have all code in a single file) and that you need browser() instead. Next, there should be an easy way to record and playback interactions you perform with your app so that you don’t have to repeat clicking here and there in order to get the app to a state that you need for debugging. I think the basis for this already exists, since shinytest does this too, but it’s not aimed on debugging and doesn’t always seem to work as we’ve experienced with our app. A dream would be something like Reverb, which allows to record, replay and even change the current state while debugging (“speculative debugging”), but this is still research.

For automated testing, it would be nice to have an option to record only what is displayed and not how it is displayed when you construct test cases with shinytest. At the moment, shinytest compares current and expected test outcome with screenshots, but I think a test shouldn’t fail when you deliberately change the color of a headline. It should instead fail when the headline says something different from what was recorded in the test case. Otherwise you will have to re-create all test cases after every tiny visual change to your app.

That being said, I don’t regret that we used Shiny for developing the DeclareDesign Wizard app and hope that in the future Shiny becomes an even more productive tool when building complex apps.

Comments are closed.

Post Navigation