Data Bondage: Debunking the Myth of Customisation

Written by Ultumus | 11 February 2021

I have been involved in “Big Data” prior to the days of it actually becoming a “a thing”, and still remember the times I had to kneel, prostate, in front our CFO, with a requisition form for a brand new very expensive EMC disk array (to the layman – a very big, very heavy storage device for data – delivered by lorry).

For those not old enough to remember the frightening part is this was not that long ago, but the advances made in the past 5 years for data storage, processing and analysis have fundamentally changed the connection between technology and data. More so, it has allowed us to build an Index and ETF data business that is not tethered to this legacy mindset, allowing us to offering a solution that offers all data points a client requires today and may want tomorrow in a single API – opposed to an endless customisations of files that will tie, willingly or not, businesses to their the current data aggregator forever , true “data bondage”

ULTUMUS, as a whole is hosted in Amazon Web Services so this analysis focuses on their tools and costs, In reviewing the costs for AWS S3 storage, with the costs scaled in terms of Initial 50TB, the next 450TB and then greater than 500TB. The fact that the minimum tier is 50TB, (which even if you used all of it would still cost in the region of one thousand dollars per month) brings tears to my eyes considering the amount I spent on the EMC disk storage of old. I do acknowledge that the costs in AWS increase as soon as you manipulate and move with the data, however the fact that we can be discussing 50TB as a base measure explains why at ULTUMUS we never throw away any of our source data files.

In addition to limitless and low cost storage, we also have additional analysis capabilities, with different use cases and tools available to overlay the same data set we utilise Athena, Kinesis , Elasticsearch and Quicksight, and are not tied to a single implementation of business Intelligence tools, we adapt as per the requirements.

The fact that storage and analysis has come on in such bounds is highly significant, the challenge we see is that our prospective client base approach and attitude hasn’t evolved in pace in regard to the benefits this step change offers As an example when I was initially working on ETFs, the data aspects we were interested in where only NAV, shares outstanding, total cash and the various basket structures. Over the last two years, the data points available have expanded as the ETFs have evolved, now we can provide currency forwards, capital gains tax, apportionment ratio, metal entitlement ratio, switchable indicator, settlement period and the different baskets associated with the period.

For a breakdown on what some of these key terms mean, check out this blog post to find out more.

The simple structure that was initially envisaged is no longer fit for purposes especially given the contraction of the trading spreads associated with these products. Additionally ETFs are not going to stop evolving, we recently worked with one ETF provider, who has a ETF with three separate underlying’s one replicated synthetically and the other two replicated physically, the data set to represent this has to expand to accommodate.

This is my issue, data points by their very nature will multiply, the requirements for data will expand and the consumption of data will need to grow to accommodate this. We often see organisations that have handcuffed themselves to a defined set of data points assuming it that constraining data allows easier analysis. Unfortunately I have to break the news to them that data doesn’t work like that, just look at the growth of Facebook and Google companies that didn’t exist twenty years ago are now at the top of their game globally due to their use and unconstrained consumption of data.

Organisations that are bound to their legacy systems that require data to be delivered in csv via ftp are condemned to continue making the same mistakes, are going to fall behind, or worse lose out completely to their competitors who have embraced this change, it needs to be recognised that data will continue to evolve in both breadth and depth.. Organisations need to accept the new data paradigm and design systems that allow data to be stored in the appropriate structure. Data points must be available for usage in any applicable use case, in a manner that can flex with new data points becoming available and as they become de facto across the industry. It is essential data points can be stored and delivered and any service the downstream systems in a defined manner and allows upgrading of components without altering the whole chain in a “big bang” approach. This will allow systems to evolve, in response to the requirements of the industry.

This is easier said than done but organisations need to start making the initial steps, breaking apart the data inputs and the consuming systems, whilst these are tied together in multiple “customised feed” the data and system will slowly asphyxiate each other.

Stay up to date with all the latest articles and insights from ULTUMUS, subscribe to our blog and be the first to know when we publish new articles.

View full post