cft

A day in the life of a blockchain

…through the eyes of a Data Scientist


user

Sergey Mastitsky

3 years ago | 23 min read

As discussed in one of my previous articles, blockchains generate a wealth of high-quality data that Data Scientists can use to answer a wide range of questions of both academic and practical importance.

Of particular interest are high-throughput smart contract-enabled blockchains, which enable the development and deployment of scalable decentralised applications (dApps).

One of such blockchain platforms is TRON. It has been particularly popular among developers of gaming and finance-related dApps. Here I will use transactional data collected from TRON to illustrate some of the patterns they can reveal. This article is a technical tutorial for Data Scientists who would like to learn more about blockchain data and methods for their analysis. All examples are implemented in R.

Readers are assumed to have a high-level understanding of blockchain technology and be comfortable with R and its tidyverse tools. The code for this article can be found on Github. To save space, I omit the code used to produce most of the graphs.

Disclaimer: No part of this article constitutes financial advice. Any use or reliance on the material presented herein is solely at your own risk and discretion. I am neither affiliated with nor do I endorse any of the companies mentioned herein. These companies and their products are mentioned for illustrative purposes only.

Data used in this study

As with other phenomena, one can study blockchain transactions at different temporal scales. Granular data that cover longer periods often paint a more nuanced picture. However, collecting large amounts of on-chain data creates a significant technical overhead (query time, storage and compute resources, etc.).

For simplicity, we will use a dataset that covers one day worth of data, namely 2 July 2021. There is no reason for choosing this specific date — it is just an example.

The data have been collected with the get_block_info() function from the R package tronr. This function takes the number of the block of interest and returns data in the form of a nested tibble. There were 28,730 blocks generated on the TRON blockchain on 2 July 2021, whose numbers ranged from 31570498 to 31599227.

These blocks contained a total of 7,819,547 transactions. Transaction ID (tx_id), type of the system smart contract call that executed a given transaction (contract_type), and account addresses of the initiator (from_address) and receiver of that transaction (to_address) are stored in the list-column tx of the tibble returned by get_block_info(). Here is how the data for the very first block in this dataset look like:

require(dplyr)
require(tidyr)
require(tronr)
#> R toolbox to explore the TRON blockchain
#> Developed by Next Game Solutions (http://nextgamesolutions.com)block_data <- get_block_info(
latest = FALSE, block_number = "31570498"
)glimpse(block_data)
#> Rows: 1
#> Columns: 11
#> $ request_time <dttm> 2021-07-26 20:25:24
#> $ block_number <chr> "31570498"
#> $ timestamp <dttm> 2021-07-02
#> $ hash <chr> "0000000001e1ba4267746fce622a90ba6a51f~
#> $ parent_hash <chr> "0000000001e1ba4135f66566c15b648563d4b6~
#> $ tx_trie_root <chr> "254pKSrGpmLT3y7xdFAaZfXR93ZrsKgRJoxZH~
#> $ confirmed <lgl> TRUE
#> $ size <int> 61119
#> $ witness_address <chr> "TLyqzVGLV1srkB7dToTAEqgDSfPtXRJZYH"
#> $ tx_count <int> 200
#> $ tx <list> [<tbl_df[200 x 4]>]block_data %>% select(tx) %>% unnest(cols = tx)
#> # A tibble: 200 x 4
#> tx_id contract_type from_address to_address
#> <chr> <chr> <chr> <chr>
#> 1 59ebe232ea5032~ VoteWitnessCon~ TXi4QTAWYGhF4Z~ TTcYhypP8m4ph~
#> 2 aff3e7fb7f277b~ TransferContra~ TRYavwpJnwhr9T~ TQMVcC2adh61t~
#> 3 8661b623b377a2~ TriggerSmartCo~ TRpzBAQHCVKHBW~ TBRs8xwajQVbD~
#> 4 ed437306b65880~ TransferAssetC~ TXAVuHVM1pBnZA~ TYBtHbJiQ2bDa~
#> 5 5a94ce1c7b0390~ TransferAssetC~ TAdqErGeD2CgTp~ TSnjNCyK3r58w~
#> 6 360999180f84eb~ TransferAssetC~ TAFpxz9pKyGDwr~ TPRiphADhwe1g~
#> 7 42a97206891117~ TransferAssetC~ TJABpJLWNXJ2xj~ TX8BumaUQ1bpR~
#> 8 baa45f5a4a18aa~ AccountCreateC~ TEoDHGSTu2Kh97~ TFLnMpKhGBUUo~
#> 9 614032934ce86c~ TransferContra~ TSWHGYtBwzpFXL~ TBy4dMk6nWhWV~
#> 10 01de787975ff49~ TransferContra~ TVU2khdDkUu67F~ TSEnu5xcNMyNi~
#> # ... with 190 more rows

The entire dataset can be downloaded into R as follows (the size of this RDS file is over 600 Mb, so be patient — it may take some time, depending on the bandwidth of your Internet connection):

dat <- readRDS(url("https://chilp.it/eee138c"))dat %>% dim()
#> [1] 28730 11names(dat)
#> [1] "request_time" "block_number" "timestamp"
#> [4] "hash" "parent_hash" "tx_trie_root"
#> [7] "confirmed" "size" "witness_address"
#> [10] "tx_count" "tx"

There was a new block every 3 seconds

Blocks are akin to pages in a ledger — these logical units are used to group and record transactions that take place on the blockchain within a certain period of time. Thanks to the Delegated Proof of Stake consensus mechanism, blocks on the TRON blockchain are generated very fast — every 3 seconds. However, there are 4 maintenance periods every 6 hours, taking 6 seconds each. As a result, the average time between blocks in our sample was just above 3 seconds:

dat$timestamp %>% diff() %>% mean()
#> Time difference of 3.00731 secs

In TRON, there are 27 “witnesses”, a.k.a. “super representatives” (SR), i.e. network nodes that can create and confirm new blocks:

dat$witness_address %>% unique() %>% length()
#> [1] 27dat$witness_address %>% unique() %>% head()
#> [1] "TLyqzVGLV1srkB7dToTAEqgDSfPtXRJZYH"
#> [2] "TJBtdYunmQkeK5KninwgcjuK1RPDhyUWBZ"
#> [3] "TTjacDH5PL8hpWirqU7HQQNZDyF723PuCg"
#> [4] "TWkpg1ZQ4fTv7sj41zBUTMo1kuJEUWTere"
#> [5] "TCEo1hMAdaJrQmvnGTCcGT2LqrGU4N7Jqf"
#> [6] "TAAdjpNYfeJ2edcETNpad1QpQWJfyBdB9V"

In most cases, witness identities or their pseudonyms can be found on the TRONScan website. For instance, the six addresses listed above belong to the following entities:

  • TLyqzVGLV1srkB7dToTAEqgDSfPtXRJZYH: Binance Staking
  • TJBtdYunmQkeK5KninwgcjuK1RPDhyUWBZ: JD Investment
  • TTjacDH5PL8hpWirqU7HQQNZDyF723PuCg: NEOPLY-Staking
  • TWkpg1ZQ4fTv7sj41zBUTMo1kuJEUWTere: TRONLink
  • TCEo1hMAdaJrQmvnGTCcGT2LqrGU4N7Jqf: TRONScan
  • TAAdjpNYfeJ2edcETNpad1QpQWJfyBdB9V: Ant Investment Group

The total number of blocks generated by witnesses per day varied within a tight range — from 1050 to 1067, with an average of 1064:

dat$witness_address %>% table() %>% range()
#> [1] 1050 1067dat$witness_address %>% table() %>% mean()
#> [1] 1064.074

Each block contained 274 transactions on average

The number of transactions per block had a somewhat asymmetric distribution, with a long right tail (Figure 1). On average, each block contained 274 transactions. However, some blocks had as many as 1126 transactions, while others had no records at all:

dat$tx_count %>% summary()
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0 229.0 274.0 272.2 316.0 1126.0
Figure 1. Distribution of the number of transactions per block on the TRON blockchain on 2 July 2021.
Figure 1. Distribution of the number of transactions per block on the TRON blockchain on 2 July 2021.

While rare, “empty” blocks are perfectly normal for the TRON blockchain. They are generated when there are too few transactions to package into a block.

Asset transfers and smart contract triggers were the most prevalent transaction types

There are over 30 types of transactions that can take place on the TRON blockchain. These types are implemented via the respective “system smart contracts”. In our dataset, ca. 65% of all transactions were associated with transfers of various tokens ("TransferAssetContract") and ca. 13% — with transfers of Tronix (TRX), the native currency of TRON (TransferContract).

Over 19% of all transactions involved triggering other smart contracts (TriggerSmartContract), such as those that implement the logic in decentralised applications. Transactions of other types (e.g., creation of new accounts, tokens, etc.) were considerably less frequent:

dat %>%
select(tx) %>%
unnest(cols = c(tx)) %>%
group_by(contract_type) %>%
summarise(n = n()) %>%
mutate(percent = round(n / sum(n) * 100, 3)) %>%
arrange(-n)#> # A tibble: 13 x 3
#> contract_type n percent
#> <chr> <int> <dbl>
#> 1 TransferAssetContract 5071888 64.9
#> 2 TriggerSmartContract 1508562 19.3
#> 3 TransferContract 1041006 13.3
#> 4 AccountCreateContract 166234 2.13
#> 5 FreezeBalanceContract 10321 0.132
#> 6 VoteWitnessContract 8643 0.111
#> 7 WithdrawBalanceContract 6857 0.088
#> 8 UnfreezeBalanceContract 5119 0.065
#> 9 CreateSmartContract 829 0.011
#> 10 AccountPermissionUpdateContract 77 0.001
#> 11 AccountUpdateContract 8 0
#> 12 AssetIssueContract 2 0
#> 13 ParticipateAssetIssueContract 1 0

The more transactions per block, the larger the block size

Given that the blocks are used to “bundle up” a group of consecutive transactions, it would be natural to expect the block size (in Mb) to positively correlate with the number of such transactions. Figure 2 confirms that this indeed was the case.

Figure 2. Relationship between the number of transactions per block and the block size.
Figure 2. Relationship between the number of transactions per block and the block size.

However, the strength of this relationship was not consistent. At numbers below 90 transactions per block, the relationship was linear and very tight. After that, the correlation was still strong and positive, but the block size variance increased. Moreover, two separate groups of points emerged on the graph above ca. 450 transactions per block. Overall, Figure 2 suggests that on 2 July 2021 the blockchain operated under several distinct regimes.

Activity on the blockchain varied considerably throughout the day

The number of transactions per block demonstrated a moderate daily seasonality, i.e. it slowly grew from midnight to ca. 09:00 (UTC time zone, here and elsewhere) and then similarly slowly declined again. However, there were numerous local peaks and troughs throughout the day (Figure 3).

Figure 3. Temporal changes in the number of transactions per block. Here and elsewhere in this article, the red line is a GAM-based smoother added to highlight the trend.
Figure 3. Temporal changes in the number of transactions per block. Here and elsewhere in this article, the red line is a GAM-based smoother added to highlight the trend.

Some of the spikes lasted for several consecutive blocks, reflecting short-term regime shifts in the blockchain activity. Let us have a closer look at the most pronounced of such shifts, which occurred between ca. 03:00 and 03:06. The number of transactions per block increased during that time both in terms of the mean level and variance (Figure 4).

Figure 4. A shift in the activity on the blockchain took place between 03:00 and 03:06. Five-minute intervals before and after the shift are shown for comparison.
Figure 4. A shift in the activity on the blockchain took place between 03:00 and 03:06. Five-minute intervals before and after the shift are shown for comparison.

As shown below, this clear regime shift was caused by one particular address (THtbMw6byXuiFhsRv1o1BQRtzvube9X1jx, a TRON account, whose identity is unknown) that started firing up lots of transactions to another address (TSQyuZowokRp3TMRCcrbsDnhnnDSg7gtMT, a smart contract associated with a large gambling website). To simplify notation, I will refer to the first of these entities as “address A”, and the second one as “address B”.

# Addresses that interacted between 03:00 and 03:06,
# ordered by the number of transactions:dat %>%
filter(
timestamp >= as.POSIXct("2021-07-02 03:00:00", tz = "UTC"),
timestamp < as.POSIXct("2021-07-02 03:06:00", tz = "UTC")
) %>%
select(tx) %>%
unnest(cols = tx) %>%
group_by(from_address, to_address) %>%
count() %>% arrange(-n) %>% head()#> # A tibble: 6 x 3
#> # Groups: from_address, to_address [6]
#> from_address to_address n
#> <chr> <chr> <int>
#> 1 THtbMw6byXuiFhsRv1o1BQRtzvu~ TSQyuZowokRp3TMRCcrbsDnhnn~ 26897
#> 2 TCd4rituYSmbeEDxXpDVF7ordH3~ TFJDc5RmS5HuLRujYSFwHadHQb~ 1198
#> 3 TCd4rituYSmbeEDxXpDVF7ordH3~ TSvin3om2vWuw5stbxTeDQrpqH~ 647
#> 4 TCd4rituYSmbeEDxXpDVF7ordH3~ TTagN2hiUpt7HBbLTgLWi8qQoS~ 374
#> 5 TAUN6FwrnwwmaEqYcckffC7wYmb~ TR7NHqjeKQxGTCi8q8ZY4pL8ot~ 184
#> 6 TH4RjfiSXxz71fNP3U3p6XwKYZ8~ TVj7RNVHy6thbM7BWdSe9G6gXw~ 149

Soon after 03:06, the number of transactions per block became less volatile again, albeit its mean level stayed somewhat elevated (Figure 4). The pair of interacting addresses AB remained the most active one, although the number of transactions between these addresses dropped considerably:

# Addresses that interacted between 03:06 and 03:11,
# ordered by the number of transactions:dat %>%
filter(
timestamp >= as.POSIXct("2021-07-02 03:06:00", tz = "UTC"),
timestamp <= as.POSIXct("2021-07-02 03:11:00", tz = "UTC")
) %>%
select(tx) %>%
unnest(cols = tx) %>%
group_by(from_address, to_address) %>%
count() %>% arrange(-n) %>% head()#> # A tibble: 6 x 3
#> # Groups: from_address, to_address [6]
#> from_address to_address n
#> <chr> <chr> <int>
#> 1 THtbMw6byXuiFhsRv1o1BQRtzvu~ TSQyuZowokRp3TMRCcrbsDnhnn~ 6713
#> 2 TCd4rituYSmbeEDxXpDVF7ordH3~ TFJDc5RmS5HuLRujYSFwHadHQb~ 2223
#> 3 TCd4rituYSmbeEDxXpDVF7ordH3~ TSvin3om2vWuw5stbxTeDQrpqH~ 396
#> 4 TCd4rituYSmbeEDxXpDVF7ordH3~ TTagN2hiUpt7HBbLTgLWi8qQoS~ 367
#> 5 TH4RjfiSXxz71fNP3U3p6XwKYZ8~ TVj7RNVHy6thbM7BWdSe9G6gXw~ 199
#> 6 TAUN6FwrnwwmaEqYcckffC7wYmb~ TR7NHqjeKQxGTCi8q8ZY4pL8ot~ 171

For comparison, a few minutes before the regime shift the list of the top 6 interacting address pairs looked as follows:

# Addresses that interacted between 02:55 and 03:00,
# ordered by the number of transactions:dat %>%
filter(
timestamp >= as.POSIXct("2021-07-02 02:55:00", tz = "UTC"),
timestamp < as.POSIXct("2021-07-02 03:00:00", tz = "UTC")
) %>%
select(tx) %>%
unnest(cols = tx) %>%
group_by(from_address, to_address) %>%
count() %>% arrange(-n) %>% head()#> # A tibble: 6 x 3
#> # Groups: from_address, to_address [6]
#> from_address to_address n
#> <chr> <chr> <int>
#> 1 TCd4rituYSmbeEDxXpDVF7ordH3~ TSvin3om2vWuw5stbxTeDQrpqH~ 517
#> 2 TCd4rituYSmbeEDxXpDVF7ordH3~ TTagN2hiUpt7HBbLTgLWi8qQoS~ 253
#> 3 TH4RjfiSXxz71fNP3U3p6XwKYZ8~ TVj7RNVHy6thbM7BWdSe9G6gXw~ 200
#> 4 TAUN6FwrnwwmaEqYcckffC7wYmb~ TR7NHqjeKQxGTCi8q8ZY4pL8ot~ 176
#> 5 TNMQ6BJyycCSMuVmVWH2Re4FySm~ TAQuJmiy83mcnyAtB6wMST6bSY~ 136
#> 6 TNaRAoLUyYEV2uF7GUrzSjRQTU8~ TR7NHqjeKQxGTCi8q8ZY4pL8ot~ 114

What kind of transactions did address A send to address B during the regime shift? To answer this question, a more detailed dataset has been collected using the get_tx_info_by_id() function from the tronr package (see Github for details). This dataset can be loaded into R as follows:

shift_tx <- readRDS(url("https://chilp.it/969333f"))shift_tx %>% dim()
#> [1] 26897 19names(shift_tx)
#> [1] "request_time" "tx_id"
#> [3] "block_number" "timestamp"
#> [5] "contract_result" "confirmed"
#> [7] "confirmations_count" "sr_confirm_list"
#> [9] "contract_type" "from_address"
#> [11] "to_address" "is_contract_from_address"
#> [13] "is_contract_to_address" "costs"
#> [15] "trx_transfer" "trc10_transfer"
#> [17] "trc20_transfer" "internal_tx"
#> [19] "info"

As address B belongs to a smart contract, it is not surprising that all 26,897 transactions received by it from address A during the regime shift were of type "TriggerSmartContract":

shift_tx$contract_type %>% unique()
#> [1] "TriggerSmartContract"

None of these transactions was directly associated with transfers of any assets, be it TRX, TRC-10, or TRC-20 tokens:

shift_tx$trx_transfer %>% sum()
[1] 0

shift_tx$trc10_transfer %>% sum(na.rm = TRUE)
[1] 0shift_tx$trc20_transfer %>% sum(na.rm = TRUE)
[1] 0

However, smart contract calls on the TRON blockchain often trigger so-called “internal transactions” that implement various actions “behind the scene”, including asset transfers. Information on such transactions can be found in the list-column internal_tx of the shift_tx tibble. For example, here are the data on internal transactions associated with one of the “normal” transactions that took place at 03:03:

shift_tx$internal_tx[[1]] %>% glimpse()#> Rows: 9
#> Columns: 12
#> $ internal_tx_id <chr> "3c4c8d286b643d1a7e0b23e2dd~
#> $ from_address <chr> "TBPrJYARpfAe9kmnHvMAWcqimn~
#> $ to_address <chr> "TCcrsGF9PdLxJF869dQsK4V5QE~
#> $ is_contract_from_address <lgl> TRUE, TRUE, TRUE, TRUE, TRU~
#> $ is_contract_to_address <lgl> TRUE, TRUE, TRUE, TRUE, TRU~
#> $ confirmed <lgl> TRUE, TRUE, TRUE, TRUE, TRU~
#> $ rejected <lgl> FALSE, FALSE, FALSE, FALSE,~
#> $ token_id <chr> "TRX", "TRX", "TRX", "TRX",~
#> $ token_name <chr> "Tronix", "Tronix", "Tronix~
#> $ token_abbr <chr> "TRX", "TRX", "TRX", "TRX",~
#> $ vip <lgl> FALSE, FALSE, FALSE, FALSE,~
#> $ amount <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.~

Aggregating across all internal transactions, we can see that in fact a total of 185,113 TRX (the equivalent of ca. $12,032 on that day) changed hands as a result of smart contract calls from A to B during the regime shift:

shift_tx %>% 
select(internal_tx) %>%
unnest(internal_tx) %>%
filter(amount > 0) %>%
group_by(token_id) %>%
summarise(total_amount = sum(amount))#> # A tibble: 1 x 2
#> token_id total_amount
#> <chr> <dbl>
#> 1 TRX 185113.

The next logical step in this analysis of internal transactions would involve getting a better understanding of the TRX movements between different accounts, identification of the entities those accounts belong to, etc. I will leave this as an exercise for the reader.

Interacting blockchain addresses form a network

It is common to represent blockchain transactional data as networks (graphs), in which addresses serve as nodes and transactions as edges connecting the nodes (see examples here and here).

For instance, let us have a look at transactions from the 10th block generated on the day under study:

dat$tx[[10]]
#> # A tibble: 278 x 4
#> tx_id contract_type from_address to_address
#> <chr> <chr> <chr> <chr>
#> 1 85e2dc35e095d5~ TransferContra~ THXPAPiRd62EWk~ TV8oyJ4VEQgE~
#> 2 0733b9aea36aa5~ TransferAssetC~ TCjeJWKkiEgih2~ TA9uJrWuTueK~
#> 3 0c841ce6d571da~ TransferAssetC~ TP8BWU6dYD1fHc~ TSpnfuhALfLJ~
#> 4 51802e2f59cb7b~ TransferAssetC~ TAdTQUqsumn1NG~ TX6HKSS5VnUr~
#> 5 9e3b34b7f99a8f~ TransferContra~ TWq5u64ic279j8~ TM9H9cS6wZox~
#> 6 0820513585ac7b~ TriggerSmartCo~ TJZziktTWeCXgR~ TBRs8xwajQVb~
#> 7 5cc4dd58b3c25c~ TransferAssetC~ TLmqHdqkD5shgj~ TUrCM9mV41FN~
#> 8 a526124805cb6e~ TransferAssetC~ TEpyCakhKtwvtY~ TGmgoGskK3pX~
#> 9 4449d115da5c13~ TransferAssetC~ TWoJEZhqpJxzu7~ TNLAoWMw6WTk~
#> 10 03df1b68f2d7fa~ TransferAssetC~ TJFhAVZjBaCMvt~ TQ8M5KzRJRZi~
#> # ... with 268 more rows

First, we will summarise the above data by (i) finding all unique pairs of sending (from_address) and receiving addresses (to_address) and (ii) counting the number of transactions for each pair (edge weights):

nb <- dat$tx[[10]] %>%
group_by(from_address, to_address) %>%
summarise(weight = n()) %>%
ungroup() %>%
arrange(-weight)nb
#> # A tibble: 238 x 3
#> from_address to_address weight
#> <chr> <chr> <int>
#> 1 TCd4rituYSmbeEDxXpDVF7ordH~ TSvin3om2vWuw5stbxTeDQrpq~ 33
#> 2 TEoDHGSTu2Kh97HjCbbcEUaCTm~ TCcjjJ4v7zu6W4F1wwBhuc7EB~ 2
#> 3 TJP3A9agR5CP31UE29mhFXK1wk~ TCXeYmM1g44qtRVEHbf9ehwgB~ 2
#> 4 TNXscxfqxNtpaeUZKoEUkvEY5e~ TGsyWcVkFBxTknYzXGWvrZPaX~ 2
#> 5 TP7NeoT2sM1hkpTuqNrvHorrRz~ TNmiCAm7Q38jKvWsrSRReREuy~ 2
#> 6 TPaGwxVw7q26DCxRSGG9b2BNRB~ TTmdBHmoPqXE7EC22QtzPi2Ag~ 2
#> 7 TTAct3jgaabAhxSvoZW9SAGVu4~ TWMiYQvrXuyQw5pk6Yt2L6oQF~ 2
#> 8 TTfGseRet2nHfvi5KhPx9vCf6Q~ TA7oywPknyAewrq9bss7AzGGs~ 2
#> 9 TTfGseRet2nHfvi5KhPx9vCf6Q~ TYYoV3GVuwjHkCYKfYRYwLjBb~ 2
#> 10 TA1EHWb1PymZ1qpBNfNj9uTaxd~ TDyoqH6N91PAYrihFzE1VER7H~ 1
#> # ... with 228 more rows

The resultant tibble can now be converted into an object of class "igraph", as implemented in the popular package igraph:

require(igraph) # popular R package for network analysisnb_graph <- nb %>%
graph_from_data_frame(., directed = TRUE)summary(nb_graph)
#> IGRAPH c55f992 DNW- 350 238 --
#> + attr: name (v/c), weight (e/n)

The resultant object is a graph with 350 nodes and 238 edges. To visualise this graph, we can use ggnetwork, an R package that implements several convenient network geometries for ggplot2 (another great R package for network visualisations is ggraph). The result is shown in Figure 5:

require(ggplot2)
require(ggnetwork)# Figure 5:
ggplot(fortify(nb_graph, arrow.gap = 0.007),
aes(x = x, y = y, xend = xend, yend = yend)) +
geom_edges(
color = "#57cbcc",
arrow = arrow(length = unit(3, "pt"), type = "closed")
) +
geom_nodes(color = "#343a40") +
theme_blank()
Figure 5. Network of TRON addresses (dark dots) and transactions (turquoise arrows) recorded in block 31570507. Some addresses sent out several transactions to the same receiving addresses during the day. However, edge weights are ignored in this figure for clarity.
Figure 5. Network of TRON addresses (dark dots) and transactions (turquoise arrows) recorded in block 31570507. Some addresses sent out several transactions to the same receiving addresses during the day. However, edge weights are ignored in this figure for clarity.

Figure 5 reveals patterns that are rather typical for blockchain transaction networks:

  • the network is mainly composed of small disconnected components, with 2–4 interacting addresses each;
  • there are a few star-like components, where either one address receives transactions from many other addresses or one address sends out transactions to many other addresses.

The structural properties of a transaction network can be characterised quantitatively using a variety of metrics, such as:

  • Diameter: the largest number of nodes that must be traversed to reach from one node to another.
  • In-degree and out-degree: the number of incoming or outgoing transactions associated with a node, respectively. When characterising the entire directed network, of interest are often the minimal and maximal in- and out-degrees.
  • Average degree: the number of edges divided by the number of nodes.
  • Edge density: the ratio between the number of edges and the maximal possible number of edges.
  • Assortativity: a measure of preferential attachment, i.e. a phenomenon where nodes tend to attach to other nodes that are similar in some way. Assortativity is usually calculated as the Pearson correlation coefficient of degree between pairs of linked nodes.

All of these metrics can be easily calculated using the respective functions from igraph. For the network depicted in Figure 5 we get:

# Diameter:
diameter(nb_graph, directed = TRUE)
#> [1] 33# Max in-degree:
degree(nb_graph, mode = "in", loops = FALSE) %>% max()
#> [1] 32# Max out-degree:
degree(nb_graph, mode = "out", loops = FALSE) %>% max()
#> [1] 9# Average degree:
gsize(nb_graph) / gorder(nb_graph)
#> [1] 0.68# Edge density:
edge_density(nb_graph, loops = FALSE)
#> [1] 0.001948424# Assortativity:
assortativity_degree(nb_graph)
#> [1] -0.228756

Overall, these results agree with the patterns we discerned earlier by visually examining Figure 5.

The structure of the transaction networks varied considerably throughout the day

It is natural to expect the structure of block-specific transaction networks to be dynamic in line with what is happening on the blockchain. Let us calculate the aforementioned structural metrics for each block and see how they varied over time.

First, we convert each block’s list of transactions into a graph object of class igraph and store the result in a nested tibble:

require(purrr)nw_stats <- dat %>%
select(timestamp, tx) %>%
group_by(timestamp) %>%
unnest(tx) %>%
select(-c(tx_id, contract_type)) %>%
mutate(to_address = if_else(
is.na(to_address), from_address, to_address)
) %>%
group_by(timestamp, from_address, to_address) %>%
summarise(weight = n()) %>%
group_by(timestamp) %>%
nest(nw = c(from_address, to_address, weight)) %>%
mutate(nw = map(nw, graph_from_data_frame))# The resultant tibble contains a list-column `nw`, where
# each element is an `igraph` object:
nw_stats
#> # A tibble: 28,719 x 2
#> # Groups: timestamp [28,708]
#> timestamp nw
#> <dttm> <list>
#> 1 2021-07-02 00:00:00 <igraph>
#> 2 2021-07-02 00:00:09 <igraph>
#> 3 2021-07-02 00:00:12 <igraph>
#> 4 2021-07-02 00:00:15 <igraph>
#> 5 2021-07-02 00:00:18 <igraph>
#> 6 2021-07-02 00:00:21 <igraph>
#> 7 2021-07-02 00:00:24 <igraph>
#> 8 2021-07-02 00:00:27 <igraph>
#> 9 2021-07-02 00:00:30 <igraph>
#> 10 2021-07-02 00:00:33 <igraph>
#> # ... with 28,698 more rowssummary(nw_stats$nw[[1]])
#> IGRAPH 7f07025 DNW- 270 189 --
#> + attr: name (v/c), weight (e/n)

Now, let us write a utility function that calculates network metrics and then apply it to each igraph object stored in nw_stats using the map() function from purrr:

# Function to calculate network metrics:
get_nw_stats <- function(g) {
tibble(
n_nodes = gorder(g),
n_edges = gsize(g),
diameter = diameter(g, directed = TRUE),
max_in_degree = degree(g, mode = "in",
loops = FALSE) %>% max(),
max_out_degree = degree(g, mode = "out",
loops = FALSE) %>% max(),
avg_degree = n_edges / n_nodes,
edge_density = edge_density(g, loops = FALSE),
assortativity = assortativity_degree(g)
)
}# Calculate network metrics for each block:
nw_stats <- nw_stats %>%
mutate(stats = map(nw, get_nw_stats))glimpse(nw_stats$stats[[1]])
#> Rows: 1
#> Columns: 8
#> $ n_nodes <int> 270
#> $ n_edges <dbl> 189
#> $ diameter <dbl> 3
#> $ max_in_degree <dbl> 20
#> $ max_out_degree <dbl> 10
#> $ avg_degree <dbl> 0.7
#> $ edge_density <dbl> 0.00260223
#> $ assortativity <dbl> -0.1411257# Unnest `nw_stats` for easier plotting:
nw_stats_flat <- nw_stats %>%
ungroup() %>%
select(timestamp, stats) %>%
unnest(cols = stats)glimpse(nw_stats_flat)
#> Rows: 28,719
#> Columns: 9
#> $ timestamp <dttm> 2021-07-02 00:00:00, 2021-07-02 0~
#> $ n_nodes <int> 270, 409, 525, 555, 28, 680, 658, ~
#> $ n_edges <dbl> 189, 293, 398, 436, 27, 535, 485, ~
#> $ diameter <dbl> 3, 4, 8, 6, 1, 8, 3, 52, 46, 33, 3~
#> $ max_in_degree <dbl> 20, 30, 78, 98, 27, 135, 63, 64, 2~
#> $ max_out_degree <dbl> 10, 10, 8, 9, 1, 12, 8, 10, 5, 9, ~
#> $ avg_degree <dbl> 0.7000000, 0.7163814, 0.7580952, 0~
#> $ edge_density <dbl> 0.002602230, 0.001755837, 0.001446~
#> $ assortativity <dbl> -0.1411257, -0.2274452, -0.3283519~

Figure 6 shows how the number of nodes (left panel) and edges (right panel) per block varied over time. Unsurprisingly, the dynamics of both of these metrics were similar to that of the number of transactions per block (see Figure 3). Also, given the highly disconnected nature of block-specific transaction networks, the number of nodes strongly positively correlated with the number of edges (Figure 6, bottom panel).

Figure 6. Temporal changes in the number of nodes (upper left panel) and the number of edges (upper right panel) in the per-block transaction networks. The bottom panel illustrates the correlation between these two metrics.
Figure 6. Temporal changes in the number of nodes (upper left panel) and the number of edges (upper right panel) in the per-block transaction networks. The bottom panel illustrates the correlation between these two metrics.

The maximal out-degree demonstrated a similar intraday variation to that of the number of nodes and edges, whereas the maximal out-degree, despite its multiple local peaks and troughs, on average showed almost no trend (Figure 7). This finding suggests that the structure of transaction networks was mostly driven by the number of senders interacting with a given receiving address simultaneously.

Figure 7. Temporal changes in the maximal in-degree (left panel) and the maximal out-degree (right panel) of the per-block transaction networks. Note the log scale of the Y-axis.
Figure 7. Temporal changes in the maximal in-degree (left panel) and the maximal out-degree (right panel) of the per-block transaction networks. Note the log scale of the Y-axis.

The edge density had a mild U-shaped trend and thus negatively correlated with the number of nodes and edges. Although to a much lesser extent, it also negatively correlated with the maximal in-degree (Figure 8).

Figure 8. Upper panel: temporal changes in the edge density of the per-block transaction networks. Bottom panels: relationships between the edge density and the number of nodes, number of edges, and maximal in-degree, respectively.
Figure 8. Upper panel: temporal changes in the edge density of the per-block transaction networks. Bottom panels: relationships between the edge density and the number of nodes, number of edges, and maximal in-degree, respectively.

The average network degree was relatively stable throughout the day, oscillating at around 0.7 (Figure 9).

Figure 9. Temporal changes in the average degree of the per-block transaction networks.
Figure 9. Temporal changes in the average degree of the per-block transaction networks.

The diameter of per-block transaction networks showed consistently high variation throughout most of the day (from 1 to ca. 80; Figure 8). However, the variance of this metric dramatically decreased and its mean level simultaneously jumped a couple of times between ca. 03:00 and 05:00, i.e. soon after the pronounced regime shift in the blockchain activity described in detail above.

Figure 10. Temporal changes in the diameter of block-specific transaction networks. Note the log scale of the Y-axis.
Figure 10. Temporal changes in the diameter of block-specific transaction networks. Note the log scale of the Y-axis.

The assortativity coefficient was moderately negative most of the time (Figure 11), suggesting that low-degree nodes tended to attach to high-degree nodes. This finding agrees well with the presence of a few large star-like components in otherwise very disconnected transaction networks that can typically be observed on the TRON blockchain (see, for example, Figure 5).

Figure 11. Assortativity of the per-block transaction networks.
Figure 11. Assortativity of the per-block transaction networks.

However, there were multiple spikes in the assortativity throughout the day. The frequency of such spikes increased at the end of the day. Figure 12 compares two transaction networks — one with the lowest (-0.53), and one with the highest (0.79) assortativity coefficient.

Figure 12. Per-block transaction networks with the lowest and the highest assortativity coefficient recorded on the TRON blockchain on 2 July 2021.
Figure 12. Per-block transaction networks with the lowest and the highest assortativity coefficient recorded on the TRON blockchain on 2 July 2021.

Conclusion

This article demonstrated some of the patterns that can be extracted from data generated on a high-throughput smart contract-enabled blockchain. However, it barely scratched the surface.

For example, other interesting aspects worth exploring include deanonymisation of the blockchain entities, asset transfers between accounts of various types, the link between on-chain activities and the price of tokens hosted on the blockchain, etc. I will cover these and other research questions in future articles.

Upvote


user
Created by

Sergey Mastitsky

Data Science consultant with multiple years of experience across academic and industrial sectors. Author of several books on data analysis and visualisation.


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles