Code
::p_load(tidyverse, jsonlite, SmartEDA, tidygraph, ggraph) pacman
Mini Challenge 1
May 17, 2025
May 25, 2025
In the code below, ‘fromJSON()’ of jsonlite package is used to import MC1_graph.json file into R and save the output object.
This ensures each id from your node list is mapped to the correct row number.
Warning: ggrepel: 17411 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Warning: ggrepel: 790 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
---
title: "In Class Exercise 3"
subtitle: "Mini Challenge 1"
format: html
date: 05/17/2025
date-format: long
date-modified: last-modified
editor: visual
---
# 3.1 Installing and Loading R Packages
```{r}
pacman::p_load(tidyverse, jsonlite, SmartEDA, tidygraph, ggraph)
```
# 3.2 Importing Knowledge Graph Data
In the code below, 'fromJSON()' of jsonlite package is used to import MC1_graph.json file into R and save the output object.
```{r}
kg <- fromJSON("data/MC1_graph.json")
```
# 3.3 Inspect Structure
```{r}
str(kg, max.level = 1)
```
# 3.4 Extract and Inspect
```{r}
nodes_tbl <- as_tibble(kg$nodes)
edges_tbl <- as_tibble(kg$links)
```
# 3.5 Initial EDA
```{r}
ggplot(data = edges_tbl,
aes(y = `Edge Type`)) +
geom_bar()
```
# 3.6 Creating Knowledge Graph
## 3.6.1 Mapping from node id to row index
```{r}
id_map <- tibble(id = nodes_tbl$id,
index = seq_len(
nrow(nodes_tbl)))
```
This ensures each id from your node list is mapped to the correct row number.
## 3.6.2 Map source and target IDs to row indices
```{r}
edges_tbl <- edges_tbl %>%
left_join(id_map, by = c("source" = "id")) %>%
rename(from = index) %>%
left_join(id_map, by = c("target" = "id")) %>%
rename(to = index)
```
## 3.6.3 Invalid Edges
```{r}
edges_tbl <- edges_tbl %>%
filter(!is.na(from),!is.na(to))
```
## 3.6.4 Creating the Graph
```{r}
graph <- tbl_graph(nodes = nodes_tbl,
edges = edges_tbl,
directed = kg$directed)
```
## 3.6.5 Visualising the Knowledge Graph
```{r}
set.seed(1234)
```
```{r}
#| eval: false
ggraph(graph,layout = "fr") +
geom_edge_link(alpha = 0.3,
colour = "gray") +
geom_node_point(aes(color = `Node Type`),
size = 4) +
geom_node_text(aes(label = name),
repel = TRUE,
size = 2.5) +
theme_void()
```
# 3.7 Cleaning Up
## 3.7.1 Filter Edges to only "MemberOf"
```{r}
graph_memberof <- graph %>%
activate(edges) %>%
filter(`Edge Type` == "MemberOf")
```
## 3.7.2 Extract only connected nodes (i.e., used in these edges)
```{r}
used_node_indices <- graph_memberof %>%
activate(edges) %>%
as_tibble() %>%
select(from, to) %>%
unlist() %>%
unique
```
## 3.7.3 Keep only those nodes
```{r}
graph_memberof <- graph_memberof %>%
activate(nodes) %>%
mutate(row_id = row_number()) %>%
filter(row_id %in% used_node_indices) %>%
select(-row_id) #optional cleanup
```
## 3.7.4 Plot the subgraph
```{r}
ggraph(graph_memberof,
layout = "fr") +
geom_edge_link(alpha = 0.5,
colour = "gray") +
geom_node_point(aes(color = `Node Type`),
size = 1) +
geom_node_text(aes(label = name),
repel = TRUE,
size = 2.5) +
theme_void()
```