5 Web solutions to bring the user closer to simplified XAI explanations

5.1 XAI explorer: The proof of concept

These geometries facilitate the graphical representation of high-dimensional data, while the interactivity requirements ensure that users can dynamically explore and interact with these visualisations. This foundational work is critical to enabling a comprehensive understanding of the facets of black box models revealed by various XAI methods.

To integrate these components into a cohesive system, we have selected a web application as the common platform. This choice is driven by several key advantages, including accessibility, user-friendly interfaces, and the capability to support a wide range of interactive features.

When implementing a solution with interactivity in high dimensions for multiple XAI methods, there are several key factors that we considered.

5.1.1 Performance

We decided to precompute results to reduce the time spent on calculating explanations on the fly. Computing the explanations of each observation on request would be time consuming and would hinder the user experience. Therefore through techniques such as parallel computing implemented in the kultarr package for anchors, explanations can be calculated up front efficiently to be used in the application.

5.1.2 State management

A significant portion of the workload is managed by sharing state between Shiny modules using a single source of truth. This state is kept reactive, meaning that any changes made by user events are automatically reflected across the application. User click events are sent to the Shiny server, which updates the internal state and communicates these updates back to the client side.

After the server processes user inputs, control is handed back to the client side through R packages and commands that send instructions to update the client-side view. In the user’s browser, callback handlers are used to respond to messages from the server. These callback functions build up the bidirectional communication between the server and the client.

5.1.3 Client server interactions

The development of this application necessitated extensive communication between the server-side and client-side environments. To ensure a smooth user experience and reduce server load, it was crucial to offload interactivity to the browser. This approach allows for smoother user interactions while maintaining efficient server performance.

Shiny is an excellent choice for rapid prototyping and development due to its robust support for a multitude of visualisation packages. Additionally, Shiny’s seamless integration with JavaScript allows for advanced interactivity and communication with the user’s browser.

5.1.4 User interface and user experience

The primary interface of our application is designed to present the data space through the detourr package. This main view enables users to navigate the data space and adjust projections to explore different perspectives of the dataset.

When users click on observations within the main view, the application displays the geometric representation of the currently selected XAI method within the data space.

Adjacent to the main view is a table where users can save selected observations. This functionality enables users to adjust the boundaries between classes as delineated by the data and retain any significant or outlying variables for further analysis.

Beneath the table of saved observations, users have the option to switch between various XAI methods. Upon selecting an instance in either the data space or the saved observations table, users can toggle between XAI methods to observe the corresponding geometric representations.

As an additional feature, users can view a global perspective of each XAI method. By examining the global behavior of these methods, users can validate the consistency and reliability of the explanations across the entire dataset.

5.1.5 Modularization

The views and logic of the web application were structured using the Rhino framework by Appsilon. This framework was chosen for its ability to separate and modularize the logic used in creating plots and generating explanations, which facilitates testing and maintenance. The modular architecture ensures that each component can be developed, tested, and debugged independently.

Each view of the application is defined using Shiny modules. Shiny modules encapsulate both the user interface and the associated logic, enabling a clean and organized structure. The initial view of the application is defined in the main module. For each XAI method, a new view was created, consisting of the global perspective and the associated logic for handling user interactions specific to that method.

5.2 Seed explorer app

5.2.1 Generating Simulated Data with the Squiggler Tool

In any two-dimensional area, there are numerous ways to define a boundary for a classification task. While complex, disjointed boundaries can exist, a single, continuous line is often preferred when teaching machine learning concepts. As a pedagogical tool, it’s far more intuitive for students and end users to visually separate two regions with a continuous line, making it easier to conceptualize how a model is attempting to learn and generalize.

However, to ensure that the generated boundary provides a suitable challenge for predictive models, the default design is intentionally complex. Instead of a simple line parallel to an axis, the design consists of a mix of oblique and axis-parallel segments. This approach ensures that the downstream classification task is not trivial and compels the model to learn non-linear relationships. Furthermore, this mixed design helps the model developer discern which of the two features is more influential in the model’s final decision-making process.

5.2.1.1 Key points of consideration

When developing this tool several technical considerations had to be taken to translate the user input into a usable decision boundary.

A dual coordinate system was implemented to translate between the screen coordinates and the data coordinates. The display coordinates range from 0 to 640 pixels, while the data coordinates are normalized to a -10 to 10 range. This separation is crucial because it allows the visualization to be resolution-independent and makes the data more meaningful for mathematical operations or external processing. The transformation also includes a y-axis flip (subtracting from the maximum value) because SVG coordinates have their origin at the top-left, while most mathematical coordinate systems place the origin at the bottom-left. When implementing this pattern, care was taken to be mindful of the order of operations during the coordinate conversion, especially when dealing with the flipped y-axis.

When dragging begins on an existing point, it selects that point for manipulation. However, when dragging starts on empty space, the system dynamically creates a new point at that location and immediately begins dragging it. This creates an intuitive user experience where clicking anywhere adds a point.

The visualization creates two complementary polygons - one above the draggable line and one below it. The upper polygon is constructed by connecting the top corners of the canvas with all the user-defined points, while the lower polygon connects the bottom corners with the same points. Both polygons require coordinate sorting to ensure proper edge connections, but this is done at render time (ie. when drawing the polygons on the interface) rather than modifying the original data structure. A key assumption is that the polygon rendering assumes points are meant to be connected in x-coordinate order, which works well for function-like curves.

5.2.2 Parallel Modeling Pipeline

We began by generating a dataset of 10,000 samples, distributed uniformly across a two-dimensional space ranging from -10 to 10 for both dimensions. A consistent seed was used throughout the entire process, regardless of the seed selected for model training, ensuring replicability. A function was designed to determine whether a given point in this two-dimensional space lay above or below a predetermined decision boundary. This function was then utilized to generate both the training and testing datasets by sampling points uniformly within the specified range and assigning each point a corresponding class label based on its position relative to the boundary. The training and testing dataset split was done using a 50/50 split resulting in 5,000 samples in the training set.

Afterwards, based on the number of replicates to test per neuron size,a random selection of seeds, spanning the range of [1, 99999], was employed for each neuron size being tested.

The training process was executed in parallel, using the generated training and testing dataset, a random seed, the number of neurons in the network, and the number of models to train for each neuron setting. For each unique seed-neuron combination, a single-layer neural network was initialized with weights determined by the seed, and trained on the training dataset for 500 epochs. Batch sizes were calculated, with each batch consisting of the square root of the number of rows in the training dataset (5,000) rounded down, resulting in 71 batches. To maintain consistency, the data order was shuffled based on the seed used for model training before each epoch.

The neural network architecture incorporated two inputs corresponding to the two features, with a ReLU activation function following the first hidden layer, and a final layer possessing a single output node. This configuration was necessary for the binary classification task. The final layer utilized a sigmoid activation function to produce a probability score between 0 and 1. The training process leveraged the Adam optimizer (kingma_adam:_2017?), configured with a learning rate of 0.01, a default weight decay of 0, and default beta values of 0.9 and 0.9999. The loss function employed was the Binary Cross Entropy Loss.

Following the training phase, the trained model was evaluated on the held-out testing dataset, with both the F1-score and accuracy recorded (hastie_elements_2009?).

In a binary classification setting, we can summarize the performance of a model using the confusion matrix, which consists of the following quantities:

True Positives (TP): correctly predicted positive samples
True Negatives (TN): correctly predicted negative samples
False Positives (FP): negative samples incorrectly predicted as positive
False Negatives (FN): positive samples incorrectly predicted as negative

Accuracy measures the proportion of correctly classified instances among all predictions: \[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]

The F1 score is the harmonic mean of precision and recall, providing a balance between the two: \[ \text{Precision} = \frac{TP}{TP + FP}, \]

\[ \text{Recall} = \frac{TP}{TP + FN}, \]

\[ \text{F1 Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

While accuracy reflects the overall correctness of predictions, the F1 score is more informative in cases of class imbalance, as it emphasizes both precision and recall.

To further understand the model’s decision boundary, we created a grid of 100x100 points spanning the same data range of -10 to 10, and the trained model was applied to this grid, storing the resulting predictions to accurately map the model’s learned boundary

5.2.3 Pedagogical Tool

To illustrate the spread of model fits that occur for a given neural network model, we developed a Shiny application that can be used in a teaching setting to encourage students to compare and evaluate different neural model architectures. The application consists of two layers, first a introductory story to ease the student into the environment and then at last the primary user interface where the student can train and evaluate neural networks.

5.2.3.1 Introducing the scenario through a story

Students often relate more to a topic more deeply when a compelling story is associated with it, especially one that reflects a real-world possibility they might encounter (ivala_enhancing_2013?). To utilize this potential, the application eases the student into the concept that’s about to be presented through a narrative centered on a fresh graduate named Joshua. We follow his story as he takes on his first task at a new data science company, providing a relatable context for the challenges of model development.

In the story, Joshua is given a dataset with a relatively simple, non-linear boundary. His initial attempts to fit a neural network fail, as the model repeatedly produces a simple linear boundary when visualized in the data space. His manager then intervenes and, using the exact same neural architecture, achieves a much better fit. The manager makes a casual, offhand comment that a better fit could always be achieved by simply using more neurons or training for longer. Disheartened but determined, Joshua tries again with his original network size. To his surprise, without changing the training process from his previous failed attempt, the model now converges to a much better solution.

This narrative illustrates a crucial concept that students will face when developing neural networks: the temptation to solve problems by simply making the model more complex. In most cases when faced with a under performing model the general consensus and the industry expectation is to make the model architecture more complex to hold more information. The story highlights the reality that even with an identical dataset and model architecture, there is a wide variety of possible outcomes due to the stochastic nature of training. Users can navigate between the pages of this introductory story and can end it at any time, which then reveals the main tool underneath. The user by this point should be curious and intrigued on what would cause a model architecture that didn’t work earlier to be able to work right now.

5.2.3.2 Aspects of the user interface

The user interface is logically structured into three main sections: “Design and Fit,” “Model Spread,” and “Individual Models.” This layout is designed to guide the user through the process of generating data, training models, and analyzing the results in a linear fashion while also providing the option to move back and forth between steps.

The “Design and Fit” section is primarily for generating data and fitting models through three distinct components. The first is the “squiggler” tool (as discussed under the Section 5.2.1), which is used to define the decision boundary for the data generating process. It initializes with a predefined decision boundary marked by a set of movable circles (similar to knots in a linear spline). Users can customize this boundary by adding new knots with a simple click on an empty space or by removing existing ones with a right-click.

The second component is where the user configures the neural network training, deciding on the number of neurons and the number of replicates to fit. To ensure reasonable performance, we provide a preset list of neuron counts, preventing users from selecting an architecture that would be unnecessarily slow to train. For each selected neuron count, the application will fit the specified number of replicate models. For example, if the user selects 4 and 8 neurons with 5 replicates, the application will train five separate single-layer networks with 4 neurons and another five networks with 8 neurons.

The third component in this section is a progress card that displays the status of the model fitting queue and the time taken for each model, offering transparency into the process. With current performance optimizations, a typical neural network can be fully trained within 15 seconds on a MacBook Pro M2 CPU.

Visualisation to show the spread of model fits in an interactive display

The “Model Spread” section is dedicated to visualizing the range of performance across all fitted models. This is primarily achieved through an interactive beeswarm plot, which displays the F1 score and accuracy of each model on the test dataset. The plot immediately reveals the variability in outcomes even for models with the same architecture. Users can hover over points for details and select individual model variants from the plot, which then populates the “Individual Models” section with that model’s specific decision boundary and misclassified points. As a second approach to visualizing variance, the application can generate an animation that cycles through all the fitted models, showing how the decision boundary shifts as the performance metric changes.

Visulisation of individual model fits in the data space

To further enhance the user experience, the interface is designed with a colorblind-friendly scheme of orange and purple, ensuring accessibility for vision-impaired users. Additionally, each card includes a dedicated help button that provides a concise summary of its functionality, allowing users to quickly refresh their understanding of what each section does.

5.2.3.3 The architecture

Figure 5.3: The architecture of the pedagogical tool

The web application is built on a foundation of R and Shiny, using the {rhino} package (rhino-rpkg?) as a robust framework for managing the codebase. The user interface components are implemented with Fomantic UI (the community fork of Semantic UI) through the {shiny.semantic} package (shiny.semantic-rpkg?). The {shiny} package (shiny-rpkg?) within R was chosen primarily to develop a proof-of-concept rapidly and to leverage powerful visualization libraries with minimal development overhead.

While the main application is in R, the architecture incorporates other languages for specialized tasks. The interactive “squiggler” tool is a bespoke component built using Svelte, which compiles to lightweight, pure Javascript and CSS. This standalone component is then seamlessly embedded into the Shiny application. The model fitting itself is handled by Python (python-book?), which is executed as a background task to ensure that the single-threaded R process does not block the user interface during intensive computations. We use PyTorch (pytorch-pypkg?) as the deep learning framework due to its flexibility in defining custom architectures and its support for inspecting model internals. To manage data transfer between R and Python, intermediate data files are saved in the Parquet format, a columnar storage format that offers better compression and speeds compared to traditional CSV files.

The application’s visualizations are powered by a combination of R packages. The interactive beeswarm plot is created by combining the {ggbeeswarm} package (ggbeeswarm-rpkg?) for the plot geometry with the ggiraph package to add the layer of interactivity. Animations are generated using the {gifski} package (gifski-rpkg?), which leverages Rust internally to create high-quality GIFs in a minimal amount of time. The general static plots are constructed using the {ggplot2} package (ggplot2_R_pkg?), which allows for the creation of highly customized and publication-quality visualizations with an efficient and declarative syntax.

5.2.4 Usage of the pedagogical tool

The design of the tool was primarily aimed at students. To use this tool in a teaching setting the teacher first has to introduce the fundamental concepts of building neural networks along with the training dynamics. Afterwards, the students can be given a simple dataset generated by the squiggle tool as practice to fit neural networks. At first the students can be told to freely choose their neural architectures. A hint can be given pointing towards the heuristic of picking the number of neurons based on the number of bends in the decision boundary. Subsequently, the teacher would provide a sample fitting template, deliberately using a seed that produces a sub-optimal result. After students have tried different neural architectures several students will inevitably face difficulty getting perfect fits. From this point, the teacher can bring forward the tool and follow along the first few pages of the tutorial to illustrate the story that a few students might have faced. Then the tool can be used to draw the decision boundary that the students faced along with several single layer neural architectures and their replicates.Once model fitting is done, we can highlight the significant impact of random numbers on the model’s decision boundary.