Tuesday, November 07, 2023

Making Obsidian Really Work

While documenting with Obsidian and Github Markdown, following tips have been most useful:

Headings...

are H1 to H6, use # sign the number of times you want the heading type e.g. #### is for H4

Linking... 

  • to a document: type `[[`, a list of documents you can link to appears
  • to a heading in another document: type `[[`, a list of documents you can link to appears, select the document to list, the use `#` sign, the headings that can be linked appears, just click the heading to link to
  • to a text block in another document: type `[[`, a list of documents you can link to appears, select the document to link to, the use `^` sign, the text blocks that can be linked appears, just click the text block to link to
  • to a heading in another document: type `[[`, a list of documents you can link to appears, select the document that is currently being edited, the use `#` sign, the headings that can be linked appears, just click the heading to link to
  • to a text block within the document: similar to anchor tags, type `[[`, a list of documents you can link to appears, select the document that you are currently editing, list of text blocks in the document appears, select the text block to link to
  • to an image

            ![[AbsolutePathOfImage]]

  • to an image with an alternate name to display

            ![AlternameName](./relative/path/to/imageFileName)

  • to an image and resize

            ![[AbsolutePathOfImage|500]]

Using tags

#FollowedByNoSpace is a tag, used in search and in generating automated knowledge graph management

Arrows

  • Up arrow (↑): ↑ 
  • Down arrow (↓): ↓ 
  • Left arrow (←): ← 
  • Right arrow (→): → 
  • Double headed arrow (↔): ↔

Colouring the text...

  • View developer console --> (on Mac: ⌘ + option + i)
  • And then edit CSS elements when it becomes necessary

Superscript or Subscript

  • Prefer to use latex formats
    • Superscript ^
    • Subscript _
OR
  • In the developer console, use html tags 
    • <sup>some text</sup>
    • <sub>some other test</sub>

Friday, November 03, 2023

Managing Technical Debt

Introduction

The gap between the current state of a software system e.g. inefficient code, design flaws, missing documentation etc. is usually called technical debt. It represents the future cost that will be incurred when addressing such shortcomings e.g. as fixing bugs, refactoring code, improving documentation, making the system more secure and scalable.

Technical debt, incurred intentionally or unintentionally is not necessarily a bad thing in all cases. In some situations, technical debt results from strategic decisions made to meet urgent business needs or to gain a short term but competitive advantage. However, it must be managed and paid down over time to prevent any overwhelming hinderances to future development. 

Technical Debt Management Framework

To effectively manage technical debt, it must be recognised, documented, and tracked as part of the software development process. This helps make informed decisions about when and how to address the debt by balancing short-term goals with long-term sustainability and quality.

The industry recognises and categorises a number of forms of technical debt. However, the following are of most interest for an investment banking enterprise:
Difficulty column gives an indication of effort that may be invested in managing and repaying a given category of the debt. As an example, the commonly identified 'Architecture and Design Debt'  is quite difficult to identify and measure in hindsight, because the decisions are soundly supported by business needs. Usually 'Architecture and Design Debt' is also closely related to other debts e.g. 'Integration Debt' and thus very difficult to even monitor and track. This debt is also very difficult to repay since architecture and design are rarely revisited without motivation and certainly not only with just the intent to repay by remediating and refactoring code.

The framework considers 'Architecture and Design Debt' as an umbrella encompassing 'Code Debt', 'Performance Debt' and others as remediation sub-categories (add a column) with the following considerations:
  • Performance Debt: Monitor performance using APM tools, identify performance bottlenecks | Set performance benchmarks, track metrics | Continuously monitor performance, allocate resources to address issues | Variability of workloads, difficulty in isolating performance causes | 
  • Code Debt: Evaluate code quality using metrics such as cyclomatic complexity and code duplication | Maintain code documentation | Conduct code reviews, implement automated code quality checks | Complexity of large codebases, rapidly evolving code
One may want to add 'Skills Debt' to the framework. It is medium difficulty, can be more scientifically measured & remediated and helps to repay other types of technical debts by a well managed and self motivated team.

The framework as well identifies challenges to expect and lists some generic steps that could be taken to alleviate the issues while actively managing and repaying technical debt.

Monday, October 30, 2023

Scratching the OLAP surface

Introduction

Long ago, business transactions were saved to relational databases and few ad-hoc decision support queries ran on the same database. As data size grew and time sensitivity of business decisions reports became apparent (e.g. fetching monthly sales reports), running complex and expensive SQL queries on the same database impacted saving business transactions to the DB.

Running complex and expensive SQL queries was moved to a purpose-built database called data warehouses (DWH). Data was sent from multiple business operational systems and integrated into central storage - the DWH, a single source of truth of business data.

Data warehouses were built on MPP principles (massively parallel processing) i.e. compute and storage were coupled to benefit from the data locality during query execution. To speed up the complex and expensive queries ETL pipelines pre-joined and normalise data before loading into DWHs. The storage footprint reduced and queries ran faster with tables laid out in two main strategies viz. star and snowflake schemas.

Finally business teams could run complex queries that filter and aggregate billions of records and analyse large volumes of data from different perspectives with help of BI tools. Thus business transaction processing separated from decision supporting analytical processing - former became OLTP and the later was the advent of OLAP.

The advent of OLAP

A data warehouse remains to be a large relational database storing data in a collection of tables unless the following additional layers make it into a multidimensional database:
  • Data is modelled by several dimensions to reduced storage footprint, improving query times e.g. tables are laid out in two main strategies viz. star and snowflake schemas.
  • ETL pipelines ensure pre-joined and normalised data is loaded into DWH
  • Complex queries with several joins and higher level aggregates are managed as materialised views
  • Dedicated UI to run complicated queries across several data dimensions
Such multidimensional database deployed on OLAP server, providing dedicated client interface to run complicated queries, visualise data and perform analytical operations across several data dimensions for generating business intelligence insights came to be called an OLAP Cube.

Key differences in OLAP and OLTP

OLTP Applications...

OLAP Systems...

Purpose

are used to run the business are used to understand the business

Purpose

handle large volumes of transactional data from multiple users e.g. online hotel bookings, mobile banking transactions, e-commerce purchases quickly process large amounts of data for in-depth data analysis across multiple dimensions for decision-making

Data sources

data is created by users as they complete business transactions pull data from OLTP databases via an ETL pipeline to provide insights such as analysing ATM activity and performance over time

Response times

milliseconds range from a second to several hours

Data storage capacity

usually have modest data storage requirements, as historical transaction data is archived require massive amounts of data storage capacity, a modern cloud data warehouse may accommodate these easily

Intended Users

are customer-facing and designed for use by frontline workers such as store clerks and hotel reservation specialists as well as online shoppers are business-facing and are used by data scientists, analysts, and business users such as team leads or executives who access data using analytics dashboards

Types of OLAP Systems

Depending on specific analytical needs of an organisation, the amount and complexity of data to handle, the required query response times, and the kind of analysis and reporting required determines which of the following OLAP systems be used:

Relational OLAP (ROLAP) systems... 

  1. facilitate multidimensional data analysis with high data efficiency
  2. SQLs retrieve and analyse data from relational tables of data warehouse
  3. are highly scalable and can handle large amounts of data
  4. have slower query response times and do not support complex calculations

Multidimensional OLAP (MOLAP) systems...

  1. are fast for multidimensional analysis and running complex calculations & aggregations
  2. store data in a multidimensional cube format, where each dimension represents a different attribute of the data e.g. time, geography, or product
  3. require extensive data preprocessing (as data is stored in multidimensional cubes)
  4. can handle limited data and are not as scalable as ROLAP systems

Hybrid OLAP (HOLAP) systems...

  1. combine the strengths of MOLAP and ROLAP systems
  2. store summary data in multidimensional cubes while detailed business data is stored in relational database, thus also improving data relevance
  3. provide fast data access for high speed querying and handle high volumes of data

Data storage strategies in OLAP systems

Star Schema...

  • is multidimensional data model, used in ROLAP systems
  • organises data into a central fact table surrounded by dimension tables
  • fact table contains measures being analysed i.e. quantitative data like sales revenue, quantity sold, profit margin etc.
  • dimension tables contain descriptive data that provide context for the measures e.g. time, geography, product information, etc.
  • each dimension table is joined to the fact table through a primary key-foreign key relationship
  • is popular as it is easy for business analysts and end users to understand and navigate through different levels of data

Snowflake Schema...

  • is ROLAP data model
  • organises data in a central fact table and normalised dimension tables (i.e. multiple broken but related tables)
  • normalises dimension tables to reduce data redundancy and improve data consistency
  • has more tables and relationships resulting in hard to understand, complex and slow queries
  • improves query performance and reduces storage requirements by eliminating redundant data

Fact Constellation (aka Galaxy) Schema...

  • contains multiple fact tables, each with its own set of dimension tables containing descriptive data
  • the fact tables have shared dimensions which links the fact tables and hence allow for even more complex queries and analyses
  • each fact table represents a different business process or measure e.g. sales or customer satisfaction
  • provides more flexibility in querying and analysing data, as users can analyse multiple business processes or metrics at the same time
  • is harder to use than the star or snowflake schema

Improving performance of OLAP systems

Pre-aggregating data for faster access

  • pre-calculating and storing summary data in OLAP cubes e.g. totals, averages, etc.
  • combining data at different levels of granularity for readily providing high-level overviews

Caching data for quick retrieval

  • repetitive queries and frequently accessed data & query results are stored in memory

Indexing on specific columns and dimensions

  • helps to quickly locate the required data rows without scanning entire storage system

Partitioning into smaller units

  • to optimise performace database engineers divide large tables or cubes into smaller, more manageable parts based on a partitioning key thus reducing the amount of data that needs to be scanned for each query

Parallel processing

  • a query is divided into parts, known as tasks, and distributed across multiple processors or cores. Each processor is assigned tasks simultaneously, allowing the query to be executed much faster than if it were processed sequentially on a single processor.

Materialized Views

  • pre-calculated views store results of complex queries as physical tables

Hardware and infrastructure

scalable infrastructure e.g. fast processors, large memory, high-speed storage etc., facilitate data discovery, unlimited report viewing, and complex analytical calculations. Cloud-based vendors for data analysis are now a default choice. They simplify integration, are reliable, easy to scale, and more affordable than on-premise data infrastructures.

Data preparation... 

Data gathering, storing and cleaning is done via two data integration methods viz. ETL and ELT.

ETL (Extract, Transform, Load) is a predefined sequence for extracting data from sources, transforming it to meet the target system's requirements, and loading it into a target data warehouse. It is complex, time-consuming and requires upfront planning for the data to be correctly transformed and loaded into the target system. 

ELT (Extract, Load and Transform) involves extracting data from data sources, loading it into target data warehouse or data lake, and then transforming it to meet the target system's requirements. 

Differences between ELT and ETL

Unlike ETL, ELT does not require a predefined sequence of steps. The extracted data is loaded into the target system as quickly as possible, and then the data transformation process is applied to it in-place.

ELT solutions are usually applied to modern cloud-based data warehouses that allow for massive parallel processing. So, ELT solutions can process large amounts of data much faster than traditional ETL solutions. Also, ELT solutions are more flexible than ETL solutions, as they allow for data transformation to be performed on the destination system in-place.

Once data has been loaded into a cloud data warehouse, engineers and analysts use modern data stack to prepare data for analysis. 

OLAP Operations

  • OLAP systems use a specific SQL language called MDX or Multidimensional Expressions
  • Also support standard SQL queries to perform OLAP analysis
Some standard multidimensional OLAP operations are:

Slice and Dice

  • Slicing: dividing one dimension within the cube into a separate table, enabling low-level and isolated analysis of a data set
  • Dicing: is dividing two or more dimensions within a cube to generate a separate cube

Drill down and Roll-up

  • drill down: is move from high-level data to view lower-level information
  • roll up: is move from detailed data to less detailed data or summarised information

Pivot

  • is to rotate data from rows to columns or from columns to rows, enabling multidimensional analysis from different perspectives and data comparisons across dimensions

Drill-through

  • creating data points to access detailed information faster. When users click on a data point in a summary, they are shown the underlying data that make up the summary

Drill-across

  • using a common dimension shared by different data sources to enable data analysis across multiple unrelated sources or cubes. This function allows analysts to perform analysis on data from multiple sources without integrating them into a single cube.

Aggregations and Calculations

  • calculations and aggregations such as sum, average, count, minimum, maximum, and variance. Users perform these operations across one or more dimensions.

Sunday, October 29, 2023

Streaming Databases

Introduction

Traditional Transactional Databases don’t scale well and may take hours to run complex SQLs with joins, aggregations, and transformations in analytical databases.

Streaming Databases serve results for such complex SQLs with sub-second latency, and provide fast continuous data transformation capabilities not possible in traditional databases.

Streaming Databases use SQL and familiar RDBMS abstractions (viz. tables, columns, rows, views, indexes), but have a completely different engine (a stream processor) and computing model (dataflows) inside.

Traditional Databases Streaming Databases
store data in tables matching the structure of the inserts & updates and all the computation work happens on read queries.
ask for the queries upfront in the form of Materialized Views, incrementally update the results of these queries as input data arrives. So, Streaming databases move the computation work to the write side.

Origins

Early came about in the capital markets vertical, where value of fast computation over continuous data is very high e.g. StreamBase and KX System . These early generation of were more event processing frameworks than databases, and optimised for unique requirements of hedge funds and trading desks and not universality and accessibility.

While SQL-like control languages were implemented by the early ones (e.g. StreamBase - created with DDL statements like CREATE INPUT STREAM ), but the users had to be streaming systems experts.

SQL below doesn’t care if the data is static or actively updating. It has the info a streaming database needs to continually provide updated result sets as the soon as the data changes.

--- Sum revenue by product category
SELECT categories.name as category, SUM(line_items.amount) AS total_revenue
FROM purchases
JOIN line_items ON purchases.id = line_items.purchase_id
JOIN products ON products.id = line_items.product_id
JOIN categories ON product.category_id = categories.id
WHERE purchases.is_complete
GROUP BY 1;

ksqlDB and Flink allow users to define transformations in SQL but users still need to understand challenging streaming concepts to work around, like eventual consistency.

Recent focus in streaming databases is on expanding access to streaming computation by simplifying the control interface so that it can be operated by those familiar with traditional databases. Thus making application of streaming databases easier. 

Example Architectures

Streaming databases are often used “downstream” of primary transactional databases and message brokers, similar to how a Redis cache or a data warehouse might be used.

  • A message broker reliably and continuously feeds streams of data into the database
  • A Change Data Capture (CDC) service translates primary DB updates into structured data messages in to the message broker
  • the SQL transformations are managed in dbt, as is in data warehouses
  • user-facing applications and tools connect directly to the streaming database, with no need for caching and with more flexibility as compared to data warehouses

Are useful...

  • to build realtime views with ANSI SQL to serve realtime analytics dashboards, APIs & apps
  • to build notifications/alerting e.g. in fraud and risk models, or in building automated services that use event driven SQL primitives
  • to build engaging experiences with customer data aggregations that should always be up-to-date e.g. personalisation, recommendations, dynamic pricing etc.

Not useful for solutions...

  • that need columnar optimisation
  • using window functions and non-deterministic SQL functions like RANK(), RANDOM() (While straightforward in traditional databases, running a these functions may result in continuous chaotic noise for streaming databases)
  • with ad-hoc querying, as response times are compromised since the computation plan is not optimized for point-in-time results

Perform and scale well because...

    • incremental updates ensure that DB update does not slow down as the dataset scales
    • pre-computed query pattern as a persistent transformation ensure that reads are fast as no computation is required, it is just key-value lookups in memory, like Redis cache
    • high frequency of concurrent reads from materialized views has minimal performance impact as complex queries with joins & aggregations are handled as persisted computation
    • Aggregations are dramatically improved since
                        Resources requirement to handle persistent transformations.   
                                                  ∝                           
                        Number of rows in the output (instead of the scale of the input)

    May not perform and scale well...

      • since SQL transformations are always running, joins over large datasets need a significant amount of memory to maintain intermediate state/s (Imagine how you would incrementally maintain a join between two datasets: You never know what new keys will appear on each side, so you have to keep the entirety of each dataset around in memory.)
      • when a single change in inputs triggers change in output in many views, (or when many layers of views depend on each other) more CPU is required for each updat
      • data updates trigger more work in DB requiring more CPU, than with data changes rarely
      • high number of total unique keys slows down read queries in traditional databases. In streaming databases, high-cardinality increases initial “cold-start” time when a persistent SQL transformation is first created, and requires more memory on an ongoing basis

      AI Opportunities in 2023 - Lecture by Dr. Andrew Ng

      AI as a general purpose tech...

      • is useful for lots of different applications e.g. electricity is good for a lot of things
      • AI collection: Supervised learning and Generative AI (in focus today) + Unsupervised learning and Reinforcement learning
        • Supervised learning: Good for labelling things e.g. 
                          e-mail >> spam or not, ship-route >> fuel consumed, Ad & user-info >> will click
          • Workflow of Supervised learning apps: e.g. restaurant reviews classification

                                  Collect dataset >> label data >> train a model >> deploy >> run

      • Last decade was the decade of large scale supervised learning. Small AI models could be built on not very powerful computers, which had good performance for certain small amount of data but with even large amount of data the performance would flatten out. With large AI models, however, the performance scales better and better with large data
      • This decade is adding to it the excitement of Generative AI
        • When we train a very large AI model on a lot of data, we get a LLM like ChatGPT
        • RLHF and other techniques tune AI output to be more helpful, honest and harmless
        • And at the heart of Generative AI is (Supervised learning) repeated prediction of next sub-part patterns given the data it has seen
      • The power of LLMs as a developer (not programmer) tool: 
        • With prompt based AI - the workflow is:

                          Specify prompt >> Deploy to cloud (e.g. build restaurant review system in few days)

      • Opportunities: massive value will be created with Supervised learning and Generative AI together, by identifying and executing concrete use cases
        • Supervised learning will double in size and Generative AI will much more than double
        • for new start-ups and for large enterprises / companies
      • Lensa was an indefensible use case as it did not add value; AirBnB or Uber are defensible because these create value 
      • The work ahead is to find the many diverse, value adding and defensible use cases

      • Refer to the "Potential AI projects space curve"
        • Advertising, and web search are the only large money making domains, with millions of users
        • As we go to the right of the curve, some example projects of interest may be:
          • Food inspection: cheese spread evenly on a pizza
          • Wheat harvesting: how tall is the wheat crop, at what height should it be chopped off
          • Materials grading, cloth grading...
        • Clearly industries other than advertising and web-search have a very long tail of $5 mn projects but with a very high cost of customisation
          • So, AI community needs to continue building better tools to help aggregate such use cases and make it easy for end users to do the customisations at affordable costs
      Instead of needing to worry about pictures of pizza AI community will create tools to enable the IT department of the pizza factory to train an AI system on their own pizzas. Thus, to realise the value of $5 mn by leveraging some low/no code tools for AI

      Referring to the AI Stack...

      • H/W semi-conductor layer at bottom is very capital intensive and very concentrated
      • Infrastructure layer above the semi-conductor later is also highly capital intensive and very concentrated
      • Developer tools layer is hyper-competitive, and only a few will be winners
        All the above said layers can be successful only if the
      • All the above said layers can be successful only if the application layer on top is even more successful e.g. Amorai - app for romantic relationships coaching
      • Recipe for building startups, "don't rush to solutions", has been inverted and now we can just do that while still keeping it cost effective
                    Ideas >> Validate >> Get CEO >> Prototype (early users) >> Pre-seed Growth (MVP) >> Seed, Growth & Scale
      • Concrete ideas can be validated or falsified efficiently
      • Even highly profitable projects but low on ethics will / should be killed
      • AGI is still decades away
      • Other areas of interest may be predicting next pandemic, climate change predictions... 

      Saturday, October 28, 2023

      Materialized Views

      Introduction

      Common, frequent queries against a database can become expensive. When the same query is run again and again, it makes sense to ‘virtualize’ the query. Materialized views address this need by enabling common queries to be represented by a database object that is continuously updated as data changes. 

      A View...

      • is a derived relation defined in terms of stored base relations (generally tables) 
      • defines a SQL transformation from a set of base tables to a derived table; this transformation is typically recomputed / re-compiled every time the view is referenced to in a query 
      • when created, does not compute any results nor does it change how data is stored or indexed
      • is a saved query on tables of a DB 
      • is referenced to, in queries, as if it were a table

      Example:

      CREATE VIEW user_purchase_summary AS SELECT
        u.id as user_id,
        COUNT(*) as total_purchases,
        SUM(purchases.amount) as lifetime_value
      
      FROM users u
      JOIN purchases p ON p.user_id = u.id;
      

      Every time a query referencing view/s is executed, it first computes the results of the view, and then computes the rest of the query using those results.

      A Materialized View...

      • takes a regular view and materializes it by upfront computing and storing its results in a “virtual” table 
      • is like a cache, i.e. a copy of the data that can be accessed quickly
      • is a regular view “materialized” by storing tuples of the view in the database
      • can have index structures and hence database access to materialized views can be much faster than recomputing the view

      Example:

      CREATE MATERIALIZED VIEW user_purchase_summary AS SELECT
        u.id as user_id,
        COUNT(*) as total_purchases,
        SUM(CASE when p.status = 'cancelled' THEN 1 ELSE 0 END) as cancelled_purchases
      
      FROM users u
      JOIN purchases p ON p.user_id = u.id;
      

      A regular view is a saved query, and, a materialized view is a saved query along with its results stored as a table.

      Implications of materializing a view

      1. When referenced in a query, a materialized view is not recomputed as the results are pre-stored and hence querying materialized views tends to be faster
      2. Because it’s stored as if it were a table, indexes can be built on the columns of a materialized view
      3. Once a view is materialized, it is only accurate until the underlying base relations are modified. The process of updating a materialized view in response to changes in underlying is called view maintenance.

      A “view” is an anchored perspective on changing inputs, results are constantly changing as the underlying data changes. Materialization just implies that the transformation is done proactively. So, "materialized views" should update automatically.

      However, in practice, some databases need materialized views to be manually refreshed and others have implemented automatic updates, albeit with limitations. 

      Note: MySQL does not support materialized view as of now. Oracle, Snowflake, MongoDB, Redshift, PostgreSQL all others do.

      Materialized views are used...

      • when SQL query is known ahead of time and needs to be repeatedly recalculated
      • primarily for caching the results of extremely heavy and complex queries that cannot be run frequently as regular views
      • as ability to define (using SQL) any complex transformation of data in DB, and let the DB maintain the results in a “virtual” table. when low end-to-end latency is required between when data originates to when it is reflected in a query
      • when low-latency query response times with high concurrency or high volume of queries is expected

      Use of materialized views in...

      Applications: Incrementally updated materialized views can be used to replace the caching and denormalization traditionally done to “guard” OLTP databases from read-side latency and overload. Instead of waiting for a query and doing computation to get the answer, we are now asking for the query upfront and doing the computation to update the results as the writes (creates, updates and deletes) come in. This inverts the constraints of traditional database architectures, allowing developers to build data-intensive applications without complex cache invalidation or denormalization.

      Analytics: ELT bulk loads raw data into a warehouse and then transforms it via some complex SQLs. The transformation may use regular views (i.e. no caching - used when it is not overly slow), or cached tables built from the results of a SELECT query (used when regular views slow down the queries due to re-computations), or incrementally updated table/s (but user is responsible for writing the update strategy).

      OR, use the fourth option i.e.

      Use "materialized views", always remain more up-to-date, more automated and less error-prone to cached tables (the end user burden of deciding when and how to update is minimized). 

      Monday, July 10, 2017

      Ethereum mining on AWS


      • Ethereum mining works only on g2.2xlarge or g2.8xlarge instances with Ubuntu 14.04 or later
      • Port 30303 must be opened for both TCP and UDP connections from `anywhere` (in security group settings)
      • Default Ubuntu available with ec2 is minimal i.e. some drm files required for the OS to see GPU drivers are missing. SSH into your machine and run following steps to fix this:
      > sudo apt-get install linux-generic  
      (Click OK for default option/s when prompted)
      > sudo reboot
      • Download CUDA drivers for ec2 instances (use Nvidia units). Working with .deb package (instead of .run) is easier (local or network makes no difference) 
      > wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/rpmdeb/cuda-repo-ubuntu1404-7-0-local_7.0-28_amd64.deb

      (Newer versions are available here)

      > sudo dpkg -i <cuda repo package>
      > sudo apt-get update
      > sudo apt-get install cuda
      • Run the following command to check driver is installed: 
      > lshw -c video
      • A line that starts with "Configuration:" should mention "...driver=nvidia...", if it doesn't search carefully or try reboot. 
      • If you see "...driver=nouveau..." instead of "...driver=nvidia..." then something is wrong - google how to get rid of it and reinstall cuda.
      • Build geth from source, refer here 
      • run geth to allow it to catch up on the chain: 
      > ~/go-ethereum/build/bin/geth
      • install ethminer from cpp-ethereum dev PPAs, refer here 
      • Use following command to check the current hash rate (~6 MH/s) and benchmark ethminer to check that your system is in order: 
      > ethminer -G -M 
      • When geth catches up on the blockchain, use the following command to generate a new account: 
      > ~/go-ethereum/build/bin/geth account new
      • start geth again with RPC enabled by using command-line below 
      > ~/go-ethereum/build/bin/geth --rpc
      • Execute following command to start ethminer
      > ethminer -G
      • If using larger g2 instance with 4 GPUs, ethminer needs to be started 4 times. Each time adding a "--opencl-device <0..3>" argument
      • Check logs carefully, ethminer should be getting work packages from geth and be "mining a block"

      Sunday, December 27, 2015

      How to...

      Make YouTube videos run faster

      Open Google Chrome
      Ctrl + Shift + J - opens Developer Tools, ensure you are on Console tab
      On the prompt copy and paste script below:

      document.getElementsByTagName("video") [0].playbackRate = 2.5

      Instead of playbackRate = 2.5, you can set any other floating number between 1.00 to 4.00

      Create a rss feed for "The Thlog"

      https://thethlog.blogspot.com/feeds/posts/default?alt=rss

      Replace "thethlog" with name of any blog on blogspot, use it for another blog hosted on blogspt. Now, add it to a feed URL to an RSS reader (e.g. Feedly) OR be more creative with GPT.


      Saturday, March 21, 2015

      Sybase 12.5 to Sybase 15.5 migration - differences that I learnt about

      1. Login triggers were introduced in ASE 15.0. A regular ASE stored procedure can automatically executed in background on successful login by any user.

      2. Fast bcp is allowed for indexed tables in ASE 15.0.2 and above. bcp works in one of the two modes

      • Slow bcp - logs all the row inserts made, is slower and is used for tables that have one or more index
      • Fast bcp - only page allocations are logged, used for tables without indexes, used when fastest speed possible is required, can be used for tables with non-clustered indexes
      3. sp_displaylogin displays when and why a login was locked & also when you last logged in. 

      4. Semantic partitions/smart partitioning: ASE 15 makes large databases easy to manage and more efficient by allowing you to divide tables into smaller partitions which can be individually managed. You can run maintenance tasks on selected partitions to avoid slowing overall performance, and queries run faster because ASE 15's smart query optimizer bypasses partitions that don't contain relevant data 

      5. With large data sets, filing through a mountain of results data can be difficult. ASE 15's bi-directional scrollable cursors make it convenient to work with large result sets because your application can easily move backward and forward through a result set, one row at a time. This especially helps with Web applications that need to process large result sets but present the user with subsets of those results 

      6. Computed columns: Often applications repeat the same calculation over and over for the same report or query. ASE 15 supports both virtual and materialized columns based on server calculations. Columns can be the computed result of other data in the table, saving that result for future repeated queries 

      7. Functional indexes: When applications need to search tables based on the result of a function, performance can suffer. Functional indexes allow the server to build indexes on a table based on the result of a function. When repeated searches use that function, the results do not need to be computed from scratch 

      8. Plan viewer in the form of a GUI: Plans for solving complicated queries can become very complex and make troubleshooting performance issues difficult. To make debugging queries simpler, ASE 15 provides a graphical query plan viewer that lets you visualize the query solution selected by ASE's optimizer.

      9. In ASE 15.0, Update statistics and sp_recompile are not necessary after index rebuild

      10. ASE 15 allows to assign two billion logical devices to a single server, with each device up to 4 Tb in size. It supports over 32,767 databases, and the maximum size limit for an individual database is 32 terabytes, extending the maximum storage per ASE server to over 1 million terabytes!

      11. As of release 12.5.1, all changes to data cache are dynamic

      12. ASE 15.0 and later versions no longer use vdevno. i.e. the disk init syntax doesn't need to mention the vdevno parameter. 

      13. Disk init syntax in 12.5 expects size parameter in K, M, and G only. From 15.0 and onwards, T (Terabyte) can be specified. 
      Also, pre 15.0; the maximum size of a device was 32GB 

      14. The configuration parameter ?default database size? was static in ASE 12. In ASE 12.5, it was made dynamic. 
      For ASE 15.0, the below table is specified by Sybase. 
      Logical Page Size 2K 4K 8K 16K 
      Initial default database size 3MB 4 MB 8 MB 16 MB 
      All system tables, initially 1.2 MB 2.4 MB 4.7 MB 9.4 MB

      15. The auto database extension was introduced in 12.5.1 and supported later versions.

      16. The Dump/Load Database and Dump/Load Tran syntax differ from version 12.5.0.3 and 12.5.2(and hence later versions) (See sybooks for more information. The compression levels 1-9 were introduced.) 

      17. ASE 12.5.0.3 and earlier versions used to allow only one tempdb in the server. But all the later versions allow creation of multiple temporary databases. 

      18. Before 15.0, after changing a database option we need to use that database and do checkpoint on it. But ASE15.0 doesn't need this. 

      19. Restricting proxy authorization is available in 12.5.2 and later releases only. 

      20. From version 12.5.2 and onwards, cache creation is made dynamic (sp_cacheconfig [cachename [,"cache_size [P|K|M|G]?] It was static earlier. 

      21. Till 12.5.2, backing up a database with password was not possible. ASE 12.5.2 and later allow dump database with passwd. 

      22. Cross platform dumps and loads were introduced in ASE 12.5.3 

      23. MDA tables (Monitoring and Diagnostic Tables) are available in 12.5.3 and later releases. 

      24. Row Level Locking: In ASE 15.0 all the system tables were converted into datarows format.

      25. Group By without Order By

      Saturday, February 28, 2015

      The Land Acquisition debate

      A debate with my dear friend Neelkamal, on facebook, is about the #LandAcquisition bill (Act of 2015) of India.

      The questions and thoughts Neel has about the #LandAcquisition bill:


      Here is what I have to say about it - part 1 and part 2.

      Land acquisition bill should be about "right to fair compensation against (someone's) property being acquired". Fair is what seems good to all parties involved. 4X is not fair any more to the farmers, having figured that the land would later sell for perhaps 40x or even more (fetched by the land acquired for "affordable housing" and other such projects). The fairness ends there. Economics is not taught, it just prevails.

      A misconception amongst many is that all farmers have several hectares of land and hence they become rich by selling land. Fact is that for 1 such huge land bank holder there are hundreds with little land, tilling which they earn bread. 

      They are uneducated and can't afford, for their children, the education promised by our constitution to every child (BTW! Do you consider it as a freebie as well?). Having sold their land they will end up as unskilled labour, the class of society that seldom make their ends meet. It is a class of the society that ends up feeling deprived and often ends up into a negative spiral and may even lead to crime. 

      Those who have enough land to become affluent, have experienced that though they had money but they never knew what to do with it and ended up wasting the fortune. Their children ended up as spoilt youth. Once bitten, twice shy. 

      Why was an ordinance required as first step to this law? Was it because the Govt believed that they force it in the lower house, for the opposition would not be able to a thing about it? Kudos to those who walked-out, a walkout is a very polite means of showing disagreement and is massively effective. It is a clear expression that those who walkout WILL NOT BE A PART TO IT while the law could still be formed. Applaud them for the befitting response, do not criticize them.

      Respected FM responded to this question - "Bharat ke pehlay pradhaan-mantri (Shri J. L. Nehru) 70 ordinance laye thay". 
      Kintu prashan to ye hai ki iss adhya-aadesh ki kya aavashyakta thi? Yeh na batayein ki kon kitnay adhya-aadesh kab or kis prakaar laya. 

      A hugely illiterate India has been a nation of farmers (70% till a few years ago, still not less than 60%), independent earners. Land acquisition not supported with education and professional skills development is turning poor farmers towards begging and crime. 

      BTW, I will be glad if we (as a nation) view farming as entrepreneurship or at least as a cottage industry and support it with various funds and means? Further, why does farming not fit the concept of "Make in India"?

      Let us bring colour to the "fairness", the now bills suggests that when Govt is acquiring the land for defence purpose the consent of the farmer is not necessary. The bill presented by the previous Govt was more reasonable, it suggested only when a state of emergency is declared (already) any land can be acquired for defence without the consent of the affected farmer(s).
      Add insult to injury, the compensation is said to be paid (by the now proposed bill) not when the recipient signs on receipt but when the Govt signs on the release of compensation.

      A commonality amongst revolts around the world is the peasants, they are not the opposition nor do they sit as those in power - but they are the real power. Give them the due.


      Neel's reply:

      What I have to say on this:

       Neel, please appreciate that land for a farmer is not an asset that they have maintained for generations to make money. Unlike many non-farming land-hoarders (look at the farmhouses around spewing affluence), land for a farmer is the only means to earn a living and manage a respectful life. 

      किसान के खेत उसकी सम्पन्नता का प्रतीक नहीं अपितु उसके आर्थिक रूप से समर्थ बनने का एक या एकमात्र साधन हैं। आज के परिपेक्ष में देशभक्ति का माप आर्थिक, शारीरिक या किसी और प्रकार के सामर्थ्य का त्याग तो नहीं होना चाहिए। 

      How are then - you, I, respected PM, the rich of rich or anyone else more or less patriotic as compared to any farmer? 

      आज देशभक्ति मापने के लिए यह देखें कि कौन देश के किन संसाधनो पर किस प्रकार का और कितना बोझ डाल रहा है और वह देश को कितना और कितने प्रकार से समृद्ध बना रहा है। आप पाएंगे कि किसान भी बाकि सभी जितना देशभक्त है। 

      I think the question to be asked is not "Why are you (the farmers) not convinced to give away the land?", the right question is "How can we convince our brothers to give their land (or even skin) for the purpose we say is National Development?". A few points to take care of in my opinion are:

      1. Bring agriculture on "Make in India" agenda. Treat farmers as entrepreneurs. (As I said earlier, give to them "the due".)

      2. Farming is a means of living that has helped generations of farmers sustain. When you ask them to part of from their land, give them not only the "now" fair value of it but also give to them (a) skills that bring to them somewhat similarly sustaining means of living and (b) give them the value of land that will be, until they acquire such a means of living - it may really be 4x or 400x.

      3. If one is not convinced, do not force them. Alter your plans instead (a little at least) where possible and let them know that they are being dealt with compassion.

      4. Publish and explain the plans for which one's land is being acquired. Make sure that the land is being used as it was asked for and make transparent a deviation and its reasons if it occurs.

      Some will be convinced soon, some later a very few perhaps never - but we need to work on a long term solution that brings people together not divide people into classes (them vs. us, they are farmers, we are middle class etc etc etc). 

      Invariably everyone needs to takes the above in perspective, lest the said ordinance turns into "The Land Grab Bill".

      Human greed? The coats of rich and powerful will be auctioned to fetched 50x the original price (if 10 L is believed) and the land of the poor can be grabbed instead at a dictated price!