The Thlog!

Saturday, October 28, 2023

Materialized Views

Introduction

Common, frequent queries against a database can become expensive. When the same query is run again and again, it makes sense to ‘virtualize’ the query. Materialized views address this need by enabling common queries to be represented by a database object that is continuously updated as data changes.

A View...

is a derived relation defined in terms of stored base relations (generally tables)
defines a SQL transformation from a set of base tables to a derived table; this transformation is typically recomputed / re-compiled every time the view is referenced to in a query
when created, does not compute any results nor does it change how data is stored or indexed
is a saved query on tables of a DB
is referenced to, in queries, as if it were a table

Example:

CREATE VIEW user_purchase_summary AS SELECT
  u.id as user_id,
  COUNT(*) as total_purchases,
  SUM(purchases.amount) as lifetime_value

FROM users u
JOIN purchases p ON p.user_id = u.id;

Every time a query referencing view/s is executed, it first computes the results of the view, and then computes the rest of the query using those results.

A Materialized View...

takes a regular view and materializes it by upfront computing and storing its results in a “virtual” table
is like a cache, i.e. a copy of the data that can be accessed quickly
is a regular view “materialized” by storing tuples of the view in the database
can have index structures and hence database access to materialized views can be much faster than recomputing the view

Example:

CREATE MATERIALIZED VIEW user_purchase_summary AS SELECT
  u.id as user_id,
  COUNT(*) as total_purchases,
  SUM(CASE when p.status = 'cancelled' THEN 1 ELSE 0 END) as cancelled_purchases

FROM users u
JOIN purchases p ON p.user_id = u.id;

A regular view is a saved query, and, a materialized view is a saved query along with its results stored as a table.

Implications of materializing a view

When referenced in a query, a materialized view is not recomputed as the results are pre-stored and hence querying materialized views tends to be faster
Because it’s stored as if it were a table, indexes can be built on the columns of a materialized view
Once a view is materialized, it is only accurate until the underlying base relations are modified. The process of updating a materialized view in response to changes in underlying is called view maintenance.

A “view” is an anchored perspective on changing inputs, results are constantly changing as the underlying data changes. Materialization just implies that the transformation is done proactively. So, "materialized views" should update automatically.

However, in practice, some databases need materialized views to be manually refreshed and others have implemented automatic updates, albeit with limitations.

Note: MySQL does not support materialized view as of now. Oracle, Snowflake, MongoDB, Redshift, PostgreSQL all others do.

Materialized views are used...

when SQL query is known ahead of time and needs to be repeatedly recalculated
primarily for caching the results of extremely heavy and complex queries that cannot be run frequently as regular views
as ability to define (using SQL) any complex transformation of data in DB, and let the DB maintain the results in a “virtual” table. when low end-to-end latency is required between when data originates to when it is reflected in a query
when low-latency query response times with high concurrency or high volume of queries is expected

Use of materialized views in...

Applications: Incrementally updated materialized views can be used to replace the caching and denormalization traditionally done to “guard” OLTP databases from read-side latency and overload. Instead of waiting for a query and doing computation to get the answer, we are now asking for the query upfront and doing the computation to update the results as the writes (creates, updates and deletes) come in. This inverts the constraints of traditional database architectures, allowing developers to build data-intensive applications without complex cache invalidation or denormalization.

Analytics: ELT bulk loads raw data into a warehouse and then transforms it via some complex SQLs. The transformation may use regular views (i.e. no caching - used when it is not overly slow), or cached tables built from the results of a SELECT query (used when regular views slow down the queries due to re-computations), or incrementally updated table/s (but user is responsible for writing the update strategy).

OR, use the fourth option i.e.

Use "materialized views", always remain more up-to-date, more automated and less error-prone to cached tables (the end user burden of deciding when and how to update is minimized).

Monday, July 10, 2017

Ethereum mining on AWS

Ethereum mining works only on g2.2xlarge or g2.8xlarge instances with Ubuntu 14.04 or later
Port 30303 must be opened for both TCP and UDP connections from `anywhere` (in security group settings)
Default Ubuntu available with ec2 is minimal i.e. some drm files required for the OS to see GPU drivers are missing. SSH into your machine and run following steps to fix this:

> sudo apt-get install linux-generic

(Click OK for default option/s when prompted)

> sudo reboot

Download CUDA drivers for ec2 instances (use Nvidia units). Working with .deb package (instead of .run) is easier (local or network makes no difference)

> wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/rpmdeb/cuda-repo-ubuntu1404-7-0-local_7.0-28_amd64.deb

(Newer versions are available here)

> sudo dpkg -i <cuda repo package>

> sudo apt-get update

> sudo apt-get install cuda

Run the following command to check driver is installed:

> lshw -c video

A line that starts with "Configuration:" should mention "...driver=nvidia...", if it doesn't search carefully or try reboot.
If you see "...driver=nouveau..." instead of "...driver=nvidia..." then something is wrong - google how to get rid of it and reinstall cuda.
Build geth from source, refer here
run geth to allow it to catch up on the chain:

> ~/go-ethereum/build/bin/geth

install ethminer from cpp-ethereum dev PPAs, refer here
Use following command to check the current hash rate (~6 MH/s) and benchmark ethminer to check that your system is in order:

> ethminer -G -M

When geth catches up on the blockchain, use the following command to generate a new account:

> ~/go-ethereum/build/bin/geth account new

start geth again with RPC enabled by using command-line below

> ~/go-ethereum/build/bin/geth --rpc

Execute following command to start ethminer

> ethminer -G

If using larger g2 instance with 4 GPUs, ethminer needs to be started 4 times. Each time adding a "--opencl-device <0..3>" argument
Check logs carefully, ethminer should be getting work packages from geth and be "mining a block"

Sunday, December 27, 2015

How to...

Make YouTube videos run faster

Open Google Chrome

Ctrl + Shift + J - opens Developer Tools, ensure you are on Console tab

On the prompt copy and paste script below:

document.getElementsByTagName("video") [0].playbackRate = 2.5

Instead of playbackRate = 2.5, you can set any other floating number between 1.00 to 4.00

Create a rss feed for "The Thlog"

https://thethlog.blogspot.com/feeds/posts/default?alt=rss

Replace "thethlog" with name of any blog on blogspot, use it for another blog hosted on blogspt. Now, add it to a feed URL to an RSS reader (e.g. Feedly) OR be more creative with GPT.