Search
⌃K
Links
🐕🦺

Accessing Blockchain Data at Scale with a Caching Microservice

To build applications on top of blockchain data, you need fast, reliable access to that data. However, on-chain data can be slow to query and difficult to work with. Coherent solves this problem by decoding raw on-chain data into a relational database format and making it accessible through database connections.
In this post, I'll demonstrate how to build a microservice in Python that caches data queried from Coherent's Snowflake database. This microservice can power applications that need fast API access to Ethereum data with minimal load on the database.

The Importance of Caching

Caching is a crucial technique for building high-performance applications. It works by storing the results of expensive operations, like database queries, and returning the cached result on subsequent requests instead of performing the operation again.
This microservice caches data from Snowflake for 2 minutes. In that window, any requests for the same token ID will receive the cached data instead of querying Snowflake again. This provides a few key benefits:
• Speed: Cached data can be returned almost instantly. This allows our API endpoint to respond quickly even under high load.
• Reliability: By relying on cached data, we minimize the number of direct database queries. This means fewer opportunities for network errors or timeouts.
• Scalability: If the microservice receives a spike of requests for a single token ID, we avoid overloading Snowflake with repeated queries for the same data. The cached result can handle many requests.
• Cost efficiency: Snowflake usage costs are based on the volume of queries. By caching data, we can serve more API requests with fewer Snowflake queries, reducing overall costs.
However, there is also a downside to caching:
• Stale data: If the underlying data changes, the cached result will be outdated until the cache expires and a new query is executed. We have set the expiry to 2 minutes to balance freshness and performance.
• Cache size: The cache memory size is limited, and if filled, performance may decrease. We are caching relatively small amounts of data per token so this is unlikely to become an issue, but could be optimized.
Overall, caching is an ideal performance optimization for this type of use case - fast, read-only API access to data that changes at a moderate frequency. Combined with a scalable database like Snowflake, it allows us to handle even demanding application workloads.

The Tech Stack

•Snowflake: Coherent's database for storing and querying decoded on-chain data. It's fast, scalable, and supports SQL.
•FastAPI: A modern, fast API framework for Python. We'll use it to build the microservice API.
•Python: For writing the microservice code.

The Microservice

The goal of the microservice is simple:
  1. 1.
    Expose an API endpoint that takes a token ID and returns information for that token from the blockchain.
  2. 2.
    Check if we have cached data for that token ID that is less than 2 minutes old. If so, return the cached data.
  3. 3.
    Otherwise, query the Snowflake database for data on that token ID.
  4. 4.
    Cache the result and return the data.
This allows us to serve data quickly while minimizing repeated queries to Snowflake. Here is the main code:
import time
from fastapi import FastAPI
from snowflake import connector
app = FastAPI()
cache = {}
@app.get("/data/{token_id}")
def get_data(token_id: str):
if token_id in cache and time.time() - cache[token_id]['time'] < 120:
return cache[token_id]['data']
conn = connector.connect(
user='user',
password='password',
account='account_name'
)
cursor = conn.cursor()
cursor.execute(f"SELECT * FROM token_data WHERE token_id = '{token_id}'")
data = cursor.fetchone()
conn.close()
cache[token_id] = {
'data': data,
'time': time.time()
}
return data
We connect to Snowflake using the python-snowflake connector library. The @app.get decorator exposes the /data/{token_id} endpoint, where {token_id} is a wildcard for the actual token ID.

Deploying the Microservice

To deploy this microservice, here are the basic steps:
  1. 1.
    Create a virtual environment:
python3 -m venv env
  1. 2.
    Activate the environment:
source env/bin/activate # On Mac/Linux
env\Scripts\activate # On Window
  1. 3.
    Install dependencies:
pip install fastapi snowflake-connector-python uvicorn
  1. 4.
    Run the uvicorn server:
uvicorn main:app
5. The microservice will be live at http://localhost:8000! You can call endpoints like http://localhost:8000/data/0x1234... to get data for a token ID.

Using Redis for a Distributed Cache

Right now, our microservice caches data in a simple dictionary in memory. This works for a single server, but if we wanted to scale the microservice across multiple machines, this cache would be limited to a single instance.
A better option is to use a dedicated distributed cache system like Redis. Redis is an in-memory database that can be run on one or more machines and shared between microservice instances.
To implement Redis caching, we'd:
  1. 1.
    Provision a Redis server (could be self-hosted or use a cloud service like Redis Enterprise Cloud or Redis Labs).
  2. 2.
    Install the redis-py library in our microservice.
  3. 3.
    On microservice startup, establish a Redis connection:
import redis
r = redis.Redis(host="redis_server", port=6379)
4. Use r wherever we currently access the dictionary cache. For example:
# Set cache
r.set(f"token_id:{token_id}", data)
r.expire(f"token_id:{token_id}", 120) # Set TTL
# Get cache
if r.get(f"token_id:{token_id}"):
return r.get(f"token_id:{token_id}")
  1. 5.
    For reads, check Redis and return data if cached. For misses, query Snowflake, cache, and return as before.
  2. 6.
    The Redis server will be shared between microservice instances, acting as a single distributed cache.
This allows our microservice to scale horizontally by simply starting more instances, without losing the benefits of caching. The instances will coordinate through the shared Redis cache.
Redis also provides additional benefits like:
  • Persistence - data is stored on disk so not lost when the server restarts.
  • Support for more complex data types - not just strings.
  • Pattern matching on keys.
  • Pub/sub messaging between instances.
  • And more! Redis is a very powerful tool for building distributed systems.
Using a dedicated cache like this, shared between instances, allows us to build a robust, scalable, and high-performance microservice API.

Conclusion

In this post, we built a simple microservice in Python to access decoded blockchain data from Coherent's Snowflake database. By caching results for 2 minutes, we were able to minimize load on Snowflake and handle requests with low latency.
We explored why caching is so important for application performance, including speed, scalability, reliability, and cost efficiency benefits. We also looked at how a distributed cache like Redis could be used to scale this microservice horizontally across multiple machines while maintaining cache coherence.
By building on a fast, scalable, SQL-based database from Coherent and employing standard API best practices like caching, we created a microservice suited for handling demanding workloads from blockchain applications. The step-by-step details in this post can serve as a template for those looking to build their own microservices to access and deliver blockchain data at scale.
Coherent allows you to focus on building your blockchain applications, handling infrastructure and data processing under the hood. If you have an idea for a new product or service powered by blockchain data, I highly recommend checking them out as a data solution - and using this microservice architecture as an reference point!