In this scenario, we’ll explore
Transfer
s, one of the core events in any blockchain, as it encapsulates the flow of tokens. We’ll be also working through a specific query to extract business value for a given question.
Let’s say you’ve just joined a brand-new EVM chain, one of quite a few now:
Your first task as a “data analyst”, “web3 analyst”, “blockchain analyst” or insert any other cool sounding title for “data analyst” here, is to create a token transfers table to be able to track the flow of erc20
tokens in and out of the network. You need to do this to serve their other request which is to know how many USDC
and USDT
tokens are inflows/outflows (minted
/burned
) on chain in the last month.
You are extremely lucky because the data engineering team has already set up a pipeline for ingesting data directly from a node and has provided you with the tables we discussed earlier, blocks
, transactions
, logs
and traces
.
Assuming the chain you’re working with follows the Ethereum Improvement Proposal standard for ERC20 tokens, then the event (emitted in logs
) you’re looking for is the following:
event Transfer(address indexed _from, address indexed _to, uint256 _value)
From reading the reading material here we know that the Transfer
event
gets hexed and can then be found in the logs
table. More specifically in topics
as the first topic (topics[0]
or topics[1]
, depending on your SQL engines indexing preferences). If you are using BigQuery
as we will in this tutorial, the indices start from 0
onwards, so it would be found in topics[0]
.
If you’ve not read the reading material in the earlier section, then you may ask google or even ChatGPT…
It will give you some implementation using the web3.js
library and the abi
of the contract. The problem is that open-source projects, verify their code. Therefore others usually fork it and modify it, without necessarily verifying the contract, or even modify the emitted events
, usually parameter names and what is indexed. For this reason, we’ll focus on extracting value out of the hex
of an event
directly instead of decoding it using the ABI. This gives us the flexibility of also capturing copy-cats.
Here is how hexing an event is done in Python:
Which in turn will return the following hex:
import sha3
k = sha3.keccak_256() # Hashing algorithm
event_emitted = 'Transfer(address,address,uint256)'
k.update(event_emitted.encode('utf-8'))
hexed_event = '0x' + k.hexdigest()
or if you prefer…
from web3 import Web3
event_emitted = 'Transfer(address,address,uint256)'
hash_object = Web3.keccak(text=event_emitted) # Hashing algorithm
hexed_event = hash_object.hex()
'0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
If you’re observant you may say, wait a minute, we’re only using the types
of arguments, not the names
or whatever this indexed
thing is. You would be correct! This is how the events get hexed in the EVM, to make them more deterministic. In other words, I can also create an event like so: