Matthew and Jack interned with Novetta’s Machine Learning Center of Excellence during the summer of 2019. This blog series discusses ADSynth, an app that creates a digital architecture diagram from a photo of a whiteboard sketch.
ADSynth leverages numerous image processing techniques to properly translate a whiteboard sketch into an editable, digital architecture diagram. These techniques are used to ensure that all detected components are connected and formatted correctly.
This post describes how we utilized our YOLOv3 object detection model predictions in conjunction with the image of the whiteboard sketch to obtain a clean, presentable graphic ready for insertion into a design document.
The process of going from model predictions to an editable graphic is best separated into two steps. First, the architecture diagram must be defined given our model’s predictions. Then, an editable representation of the architecture diagram must be created and formatted properly.
Defining the Architecture Diagram
An architecture diagram can be represented as a directed vertex-edge graph with AWS components as vertices and connections between those components as edges.
Recall that our model makes predictions on the existence and relative positioning of arrowheads and AWS components in the initial whiteboard drawing. Substantial information about the architecture diagram can be obtained directly from these predictions.
Since all connections between components must be drawn with a single arrowhead to indicate directionality, the number of connections in the sketch must equal the number of arrowheads. Also, by determining the closest component to each arrowhead using our model’s locality predictions, we know which component each connection is directed to. In other words, we can identify the endpoints of all edges and (from the model’s predictions of the existence of components) all vertices in the graph.
The last piece of information needed to define the architecture diagram is the origin of each edge, which cannot be determined directly from the model’s predictions. For this, we needed to use the initial image of the whiteboard sketch.
The initial image of the whiteboard is represented programmatically as a three-dimensional array where each pixel in the image has three associated values corresponding to a color channel in RGB colorspace (red, green, blue). Our initial goal was to turn this complicated representation into a more intuitive one, a binary image, where each pixel is either black or white. In this representation, black represents marker and white represents the absence of marker. This is a simple form of image segmentation.
Methods typically used for image segmentation, such as hard thresholding or clustering, yielded inconsistent results due primarily to environments with a lot of shadow or glare. To solve this problem, we first converted the initial image from RGB colorspace into HSV colorspace (hue, saturation, value), a colorspace more closely aligned with how humans perceive color. After separating the image into its HSV channels, thresholding on each, then combining the results, performance improved despite the potential presence of shadows or glare.
We performed thresholding on each HSV channel because each channel is better at distinguishing certain marker colors from the whiteboard. The value channel is good at picking up black marker, while the hue channel typically would miss black completely. The hue channel was better at distinguishing more vibrant colors like blue and red.
Our next challenge was to find the origin of each connection between components in the architecture diagram using the binary image that has been created. To do this, we implemented a novel, parallelizable graph-based image processing technique.
We first converted the binary image into a directed graph where pixels are vertices and adjacent pixels are connected to each other with edges. We assigned weights to the edges of the graph such that edges directed toward pixels with marker were all assigned a small weight (0.1) and edges directed toward pixels without marker were all assigned a relatively high weight (10).
We used Djikstra’s algorithm, a well-known shortest path algorithm which, given a source vertex in a graph, finds the shortest path to all other vertices in the graph. Using the pixel at the location of each arrowhead determined by our object detection model as a source, we discovered the “distance” from each arrowhead in the architecture diagram to every component. Since pixels with marker were weighed so low in relative terms, the shortest path would always be the one that traversed the most marker.
Disregarding the component that an arrowhead points to, the component found to have the shortest path from the arrowhead must be the arrowhead’s originating component.
Having detected all AWS components and calculated all connections between these components in the architecture, the architecture diagram is now fully defined.
The next step is to use the definition of the architecture diagram to create a properly formatted, editable graphic. We decided to represent the graphic in SVG (scalable vector graphics) format – XML-based text files that describe how a two-dimensional graphic should be presented and that allow interaction. This format was chosen to allow drag-and-drop edits to the ADSynth output.
Using the location predictions from the object detection model, we know the relative positions of all components. Initially, we associate each component with a cell in a grid of 15 columns and 15 rows. We eliminate all rows and columns without components. Finally, connections between components are added to the graphic, adding rows or columns to the grid as needed. Connections will have no more than one turn, as per AWS architecture diagram guidelines.
This method produces a clean, presentable graphic that appears similar to the initial whiteboard sketch in terms of where each component is located.
At this point, we have used the results of our object detection model and the image of the whiteboard AWS architecture diagram to create an editable graphic of that diagram. In our final post, we will discuss how ADSynth is brought together in an end-to-end user application using AWS services.