I'm an interdisciplinary data scientist working at the intersection of quantitative social science, technology and the built environment.

Research Discovery Interface

A search space to discover related research using NLP

To address the challenge for scholars to discover thematically related research in a multidisciplinary setting, such as that of a university library.

An interactive network of Masters and PhD theses to enable discovery of thematically related (cross-disciplinary) research.

It is challenging for scholars to discover thematically related research in a multidisciplinary setting, such as that of a university library. In this work, I use spatialization techniques to convey the relatedness of research themes without requiring scholars to have specific knowledge of disciplinary search terminology.
I approach this task conceptually by revisiting existing spatialization techniques and reframing them using a network, one of the core concepts of spatial information. To apply our design, I spatialize masters and doctoral theses (two kinds of research objects available through a university library repository) using topic modeling to assign a relatively small number of research topics to the objects.

Process 1: Background

In recent decades, the curation of scholarship and its access mechanisms have shifted from physical to virtual spaces. This shift has increased the potential for exchange of scholarly information on the Web through semantically rich research objects.

Different conceptualizations of similarity, such as geographic and topical, lend themselves to analysis through different spatializations. Spatializations support exploration, browsing, and navigating and can be exploited in future search and discovery services, complementing standard known-item searches. By spatializing information using an interactive network, researchers can question the connections between research objects or about their centrality.

Process 2: Motivation

Discovering thematically related research in a multidisciplinary setting is both important and challenging. This is a consequence of the siloing of scientific perspectives on the world into different disciplines and the heterogeneous terminologies used within them. Specifically, scholars may find it challenging to identify collaborators and methods outside of their discipline. This is problematic, given that scientific studies and applications of geographic information are increasingly transdisciplinary; they may, for example, combine knowledge from sociology and psychology, or borrow methods from computer science and engineering.

To address this challenge, I propose to complement a terminological approach to research discovery with an innovative spatial approach affording similarity judgments on research themes. To do this, I recast spatialization as a conceptual choice of a lens through which to view data (i.e. viewing research objects as a network). Just as designs for successful everyday spaces, like neighborhoods and street networks, follow spatial patterns and support important cognitive strategies, so can the designs for visual spaces that enable serendipitous discovery.

Process 3: Networks

Networks provide views of objects that are not supported by a field view, such as questions about direct connections between objects and their centrality in the network. Graphs formalize network models and give them inferential power and versatility.

Our network spatialization of theses rests on the following choices:
- The theses (research objects) are conceptualized as nodes.
- The edges are defined based on a binary topical relation between theses; if two research objects have at least one of five “top topics” in common, they share an edge.
- The edges are weighted by the value of the topic attribute (0–1).
- The edges are non-directed, as topic sharing is symmetrical.
- The nodes are embedded in a planar space, also based on value of the topic attribute.

Process 4: Conclusions

In order to enable discovery in a multidisciplinary setting, I develop a systematic spatializations that allows users to identify thematically similar research objects. These spatializations provide a helpful alternative to known-item search by facilitating exploration; they do not require users to have prior disciplinary knowledge.

The long-term goals for this work are to increase awareness of relevant previous or ongoing research by applying spatial thinking to the discovery of thematically related work. Integrating research by spatialized topic, rather than siloing it by discipline, is likely to enable increased collaboration across academic disciplines. Much like browsing stacks of books in a physical library, exploring a spatialized library repository can transform a common research task into a learning opportunity or a serendipitous discovery.

The code to build the network locally is available on Github.

The project was presented at the 14th International Conference on Spatial Information Theory (COSIT) in September 2019.

Still curious? Check out some of my other work:


︎    Get in touch: email, Github, LinkedIn.