Main content start
Blog

Seeing the Shape of Knowledge: Network Graphs, Text Embeddings, and Cosmograph

This is a write-up of a talk delivered at Stanford University in April 2026 by Nikita Rokotyan. Nikita is an award-winning data visualization engineer and designer, known for creating tools and frameworks for data exploration and analytics including Cosmograph and Unovis. As the founder of Interacta, a data visualization studio, Nikita leads a multidisciplinary team of developers, scientists, and designers to craft elegant, functional tools for exploring and understanding data.


What does knowledge look like? Not the content of it — the arguments, the evidence, the conclusions — but the shape of it? The pattern of connections between ideas, authors, papers, concepts? These are not new questions for humanists, but they have historically been hard to answer at scale. You can read deeply and trace the threads of intellectual influence across a handful of texts. You cannot, by reading alone, hold in your head the structure of 350,000 scientific articles simultaneously.

Nikita Rokotyan came to CESTA to talk about graphs — and about a tool he’s working on with his team specifically because the existing tools couldn't handle what they needed them to do.


I. Network Graphs

The basics of network graphs are conceptually simple, which is part of their appeal. You have nodes (also called points or vertices), which represent entities — people, papers, cities, words, whatever the unit of your analysis is. You have edges, which represent the relationships between those entities. From those two ingredients, you can build representations of extraordinary complexity.

Why graphs? Because many of the things humanities scholars and social scientists care about are fundamentally relational. A text isn't just a container of words; it exists in a network of influences, citations, contemporaries, and responses. A historical figure isn't just an individual; they're embedded in webs of correspondence, collaboration, and conflict. Tabular data — rows and columns — flattens these relationships. Network graphs preserve them.

Getting from the intuition to the visualization involves a few decisions. The first is what counts as a relationship. That sounds obvious, but it's a meaningful choice: you might connect two authors because they co-authored a paper, or because they cite each other, or because they publish in the same journal, or because a word embedding places them in the same semantic neighborhood. The choice shapes everything downstream.

The second decision is layout — how to arrange nodes in space. There is no single correct answer. A force-directed layout simulates physical forces (attraction between connected nodes, repulsion between all nodes) until the system reaches equilibrium, producing organic clusters that often reveal community structure. A geographic layout places nodes according to their real-world coordinates. A hierarchical layout encodes rank or ancestry. Circular and column layouts impose explicit structure. Each reveals different things about the same data; choosing well requires knowing what question you're trying to answer.

Once you have a graph, a family of algorithms can tell you things that are difficult or impossible to see by eye. Community detection algorithms (e.g. Leiden and Louvain) identify groups of nodes that are more densely connected to each other than to the rest of the network — the intellectual clusters, the schools of thought, the regional networks. Centrality measures identify which nodes are most important by different definitions of importance: degree centrality counts connections, betweenness centrality identifies nodes that bridge otherwise disconnected communities, PageRank weights connections by the importance of their sources. Shortest path algorithms find the minimum number of steps between any two nodes — useful for tracing influence or mapping how information travels across a network.

For researchers who want to build and explore graphs without writing code, tools like Gephi, VOSviewer, and Cytoscape have been the standard options. Cosmograph is a newer entry in this space, and Rokotyan was direct about why he built it: the existing browser-based tools were slow. Not slow in an inconvenient way — slow in a way that made certain kinds of work effectively impossible.


II. The Performance Problem, and How Cosmograph Solves It

The demo was clarifying. A network with seven thousand nodes and 166,000 edges brought sigma.js — one of the more established JavaScript graph rendering libraries — to its knees. The browser slowed, the interaction became sluggish, and the graph stopped being a tool for exploration and became an object you could only stare at.

The problem has two components. Rendering: putting hundreds of thousands of nodes and edges on screen and updating them in real time as the user pans, zooms, and interacts. Layout: running the force simulation that determines where each node should be placed. Both are computationally expensive, and browser-based tools have traditionally tried to solve them on the CPU, where JavaScript runs.

Rokotyan's solution was to move both operations to the GPU. Graphics processing units are designed for exactly this kind of massively parallel computation — running the same operation on many data points simultaneously. The approach came, in part, from experience building particle simulations for data visualization and for less utilitarian purposes. The physics of a force-directed graph is not so different from the physics of a particle system: many elements, each subject to forces, each needing to be updated and drawn at every frame.

The result is Cosmograph, which can handle millions of nodes and edges in a browser without sacrificing interactivity. The simulation forces — many-body repulsion, link attraction, gravity, clustering, centering — run on the GPU, as does the rendering. Cosmograph is also available as a Python widget and a JavaScript library, which means it can be integrated into existing data pipelines and notebooks without requiring users to leave their working environment.


III. Visualizing Text

The second half of the talk shifted from graphs to text — specifically, to the challenge of making large bodies of text visually interpretable.

The history of computational approaches to text is long, from Zipf's law in the 1930s through TF-IDF and vector space models in the 1970s, latent semantic analysis in the 1980s, probabilistic topic models in the 2000s, and the transformer-based language models that now dominate the field. Each generation of methods has produced richer representations of what texts mean and how they relate to one another. The question Rokotyan was interested in is: once you have those representations, how do you see them?

Text embeddings are the key artifact here. A language model, when applied to a document, produces a vector — a list of hundreds or thousands of numbers — that encodes the semantic content of that document in a high-dimensional space. Documents that are semantically similar end up close to each other in this space; documents that are about different things end up far apart. The mathematical relationships encode meaning.

The problem is that humans cannot perceive hundreds of dimensions. We need two or three. Dimensionality reduction is the process of projecting high-dimensional embeddings down to a space we can visualize, while preserving as much of the original structure as possible. The main techniques — PCA, t-SNE, and UMAP — make different tradeoffs between computational cost, preservation of local vs. global structure, and interpretability of the result. UMAP has become particularly popular for this use case because it tends to preserve both local clustering and broader topological structure reasonably well.

The output is a semantic map: a two-dimensional scatter plot where each point is a document, and proximity reflects semantic similarity. At small scales this is already useful. At the scale of 350,000 scientific paper abstracts — one of the datasets Rokotyan demonstrated — it becomes something else: a map of a field, where clusters correspond to disciplines, sub-disciplines, and research communities, and the distances between clusters reflect the actual intellectual distances between areas of inquiry.

Cosmograph handles this use case as well as the graph use case, because the core requirement is the same: render a very large number of points, update them in real time as the user interacts, and do it without the browser dying. The same GPU-based rendering engine that makes large graphs tractable also makes large embedding visualizations tractable.

Additional algorithms can be layered on top of the embedding visualization: HDBScan or K-means for clustering (producing labels you can use to color points), vector search (finding all documents similar to a query), and topic modeling (identifying the themes that characterize different regions of the map). The same dataset can also be rendered as a semantic network graph, where edges connect documents above some similarity threshold — a different visual representation of the same underlying relationships, useful for different questions.


Tools and Getting Started

Cosmograph is available at cosmograph.app as a web tool, and installable via pip install cosmograph for use as a Python widget. The JavaScript library is available on NPM via npm install @cosmograph/cosmograph. Rokotyan's broader data visualization work, including the Unovis library and the interacta.io projects, is documented at rokotyan.com. The full slide deck from the Stanford talk is available here.