Data Science

Category Archives

Data Science: Topoligical Data Analysis – Slaying the big data dragon for analytics

Topological Data Analysis (TDA) is a recent field that emerged from various works in applied (algebraic) topology and computational geometry during the past 15 years. This article aims to simplify TDA to the business intelligence and data science community. Despite assuming some level of mathematical and analytical insight from the reader it provides both an oversimplified technical explanation as well as a completely oversimplified non-technical explanation of what TDA is and can achieve in business terms. It aims to be useful at an executive and technical level not limited to but including CIO’s, CDO’s, actuaries, data analysts, data scientists, business intelligence developers as well as any business role player interested in deriving value from big data.

TDA – The problem statement

big-data-samurai-siriusAlthough one can trace back geometric approaches for data analysis quite far in the past, TDA started with the pioneering works of Edelsbrunner et al. (2002) and Zomorodian and Carlsson (2005) in persistent homology (genetic sciences) and was popularized in a landmark paper in 2009 Carlsson (2009). Carlsson noted that an important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate (see Moore’s law applied to big data) and that our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced.

Topology as a real solution to deriving knowledge from big data?

topological-data-analysis-sirius“On Guard!”. Alternative analytical methods to traditional data analysis have since been sought given the need to derive and infer knowledge effectively from the real world (where the flexibility and growth rate of data is unprecedented) and to develop other mechanisms by which the behavior of data invariants or construction under a change of parameters can continuously be effectively summarized without looking only at the raw data itself. Topology was selected by Carlsson to further the development of solutions to deal with these problems since topology is exactly that branch of mathematics which deals with qualitative geometric information. Topology studies geometric properties  (in short, topology studies the notion of shape) in a way which is much less sensitive to the actual choice of metrics than straightforward geometric methods (Technical note: coordinates in TDA may not carry intrinsic meaning). This provides the ideal type of architectural approach in data modelling to derive knowledge from the ‘shapes’  within large, hugely complex datasets as opposed to only analysing the raw data itself. It allows us to consider the ‘gaps’ and relationships between the data underlying various types of information in order to derive knowledge. Still confusing? Let’s get practical!

Completely oversimplified explanation – What is TDA?

data dragon origami sirius redAn oversimplified way to explain this to the non-mathematical academic community is to say that TDA is like deriving knowledge from the structures underlining a piece of constantly changing origami. If the points on the origami describes the origami itself, then the relationships between those points tell us something about what those points are and why they are where they are. The ‘what’ and the ‘why’ can be seen as knowledge, which we can use to generate new, creative, and meaningful origami pieces or to learn why things are how they are. This’ origami piece’ can present itself in numerous practical problems of knowledge including DNA, chemistry, human interaction and many more. Associated technologies are therefore being developed, specific to analysing and deriving knowledge on the basis of TDA for specific industries. This enables data scientists to create intelligent, non-linear computational models to effectively analyse massive datasets in order to produce well-informed recomendations to business decisionmakers. Essentially, TDA has the ability to completely transform what your actuary used to do from a data-drives-value perspective. It has the ability to allow data science to generate value from the following key aspects that intrinsically differentiate data science from traditional data analysis: Utilizing erratic, sporadic as well as consistent and pattern-based data input to produce learning, usable for well-informed, robust decision-making. This means, for example, that we can also analyse data generated outside of an organisation to find relationships affecting the organisation itself. The ability to analyse seemingly random datasets is a valuable tool when seeking innovative opportunity.

Simplified technical explanation – What is TDA?

On a technical note, TDA is mainly motivated by the idea that topology and geometry provide a powerful approach to infer robust qualitative, and sometimes quantitative, information about the structure of data. It aims at providing well-founded mathematical, statistical and algorithmic methods to infer, analyze and exploit the complex topological and geometric structures underlying data that are often represented as point clouds in Euclidean or more general metric spaces. We classify these problems to be solved in mathematics as multi-agent stochastic optimization problems. In simple terms, these are ‘search problems’ where one needs to isolate randomness from pattern in order to perform proper analysis. ‘Big data’, as a generalized term, now lends itself well to the associated characteristics.  From a computer programming and architectural perspective – In TDA, multi-agent cooperative decision making can be modeled as a cyclic (decentralized) optimization, where the joint decision vector is optimized by sequentially optimizing each individual agent’s decision vector while holding the others fixed. Simply stated, in TDA, one object(vector) in the network can be optimised for decisionmaking without adjusting any of the other objects, while still leading to the best outcome for the entire network model. Moreover, because of uncertainty in knowledge of the target and knowledge of the state of the other agents, the problem is a stochastic optimization problem where only noisy measurements of the objective function are available to each agent. The unerlying design and associated programming for TDA can therefore potentially be utilised to simulate various types of neural systems for analysis, such as cellular interaction, neural / brain interaction and numerous other forms of previously unexplained systems which inhabit patterns based on objects ‘constructed’ with non-fixed coordinates (eg. specific solar systems, new molecular models, weightless mass computation models). TDA could provide ground-breaking analytical capability to many endeavors of scientific research across disciplines, yet be simplified to be practically applied to the scenarios of current business industries.

Potential – What TDA is achieving on the ground level and may do in the future

zigbee-internet-of-things[1]The mathematical branch of topology studies only properties of geometric objects which do not depend on the chosen coordinates, but rather on intrinsic geometric properties of the objects. One can imagine any 3-D shape  (eg. a tetrahedron) , behaving as a function which changes its length parameters (the lengths of its sides) constantly.  The points of the shape is therefore coordinate-free and hence TDA, utilising topology, provides a flexible method to ‘plug-and-play’ various scenarios for analysis. The relationships which are useful involve continuous maps between the different geometric objects and therefore become a manifestation of the notion of functoriality (Functoriality means that something is a functor. E.g. when someone asks about the functoriality of some construction, they’re asking whether or not it can be upgraded to a functor). This is leading to the development of multiple variants of clustering algorithms, which, functionally, can be used across a range of spaces including neuroscience, mathematical analysis itself, information technology decision support, corporate big data analysis, medical sciences, human behavioral sciences and many more to mention only a few. The current ideal application of TDA stretches way beyond the fairly simple analytical requirements of businesses and should be extended into advanced machine learning technologies to further the development of artificial intelligence (A.I)

Sounds too complex? Well, despite the fact that you might soon be replacing your actuary with a data scientist at a much higher salary, TDA might just be the single most innovative approach yet to properly slaying your big data analytics dragon.

– Johan Smith (Executive: Business Solutions)

A senior advisor will contact you as soon as possible. Contact: advisory@siriussa.comRequest Assistance Here!