Below is the output of an analysis of some +150 large and representative transactions in the D&A space (Data, Data management and Analytics). Were considered relevant transactions in private companies >$10M. A few select M&A transactions are mentioned, but for information purposes only and the study otherwise excluded M&A and listed companies. Obviously, this is not a representation of the whole universe of relevant transactions, but it is deemed to be a good representation and overview of the landscape to visualize some of the shifts and trends that have taken place across 2017–2020:
A brief history of time in Data and Analytics, through +150 representative VC large transactions, 2017–2020
Note: there may be differing views on where some of the companies may fit. Also, obviously in some cases, some companies span several categories. For instance Dremio spans several categories. And BigID could span several too. But the categorization was deemed ~ok by most VCs.
Some trends that one may visualize: convergence of data platforms, simplification of complexity and the increasing importance of collaboration.
Several trends can be seen changing the landscape, adding improvements or simplification, and were highlighted by the VCs interviewed:
- Convergence of data platforms. Some of the top VCs are wondering if the data platforms space is not undergoing some convergence. Unified cloud data platforms keep rising, the likes of Snowflake or Databricks providing data warehousing or datalake operations. Some are now wondering if data warehouses and datalakes may not be coming together. It seems that while unstructured data can now become somehow structured (with labelling, categorizing, etc.), structured data can also be treated close to unstructured data. In this new world, the focus would be on the use cases rather than the types of data processed. In addition, the same may be happening with AI and collaboration which remain top of mind but also seems to be attempting to complement data provisioning capabilities and some are hence wondering the space may also converge with the data platforms one (companies including DataRobot, Dataiku, Domino, H20, etc.). Finally, the same may be happening with real-time and continuous intelligence platforms, companies such as Confluent, C3iot, Samsara, InfluxDB, etc., to connect real-time and IOT data with current data platforms. Overall, it may be that the focus will increasingly be on the types of use cases rather than the types of data processed.
- Connectivity, data integration and workflow management — increasing importance and funding: transactions and feedback from VCs seems to show strong increased interest for management of data flows. It seems much has yet to come: customers are experiencing many challenges to connect, integrate, push data ‘in’ and ‘out’ (the ‘out’ part especially seems to be more difficult). This makes complete sense when one realizes the variety in data connectivity options — things are far from the sometimes-advertised simple API connection to exchange data. Companies there may include Fivetran, Adverity, Postman, Workato, Prefect, Snaplogic, Tray.io, Astronomer, etc. (Note: It is interesting to note that while this may enable business users with few technical skills are getting access to platforms, sometimes low-code to realize self-serve operations, at the other end of the spectrum lies the acceleration of data operations, which targets technical users/data engineers to help them realize data operations faster — both trends are going together, tackling different users)
- Special Purpose databases — increasing amount of funding — on the rise? This category remains heterogenous and sometimes hard to sell. Relational DB are complemented by Graph, or distributed, or other types for select usages. VCs are overall excited about this area, as the potential for either complementarity with existing solutions, or better, disruption of existing players, is high. This includes companies like Rockset, Scylla, Couchbase, Cockroach labs, Yugabyte, Neo4j, etc. Open source innovation is also gaining traction with technologies including those from DBT, MariaDB, Yugabyte, etc.
- Automation/Process Mining and RPA — have the winners been chosen? Process mining and RPA are large important areas underlying digitization for many companies. However, funding amounts seem to have decreased in this area. Companies there include UiPath, Automation Anywhere, Celonis, etc. Has this area fallen out of grace, or have on the contrary the winners been chosen? May this be because a wave of IPOs, M&A or innovation is preparing?
- Data Privacy — best years still ahead of us? To many VCs, Privacy management is an exciting strategic space that can change the data landscape. It seems promising. Likely due to the technicality of the field, few players are making the market with different offerings, including BigID, Onetrust, Privitar. Sometimes they are enabled by governance and data lineage. Overall, VCs agreed this trend is likely to be felt for decades as regulations (GDPR, HiPaa, etc.) and data structuring requirements will likely continue to become more complex and demanding and create customer needs for finding and controlling privacy data.
- Business analytics and augmentation — increasing numbers and high M&A activity: A very interesting market and field which would be top of all categories if M&A were accounted for. Indeed, in 2020 Looker was acquired by Google for $2.6Bn, in 2018 Datorama was acquired by Salesforce for $0.8Bn. Some VCs think the wave is now on its mature stage and that it is easier to switch between providers now. But some others think on the other hand that augmentation of analytics may lead the way into a renewed wave, either through various input modes, or through automated analysis of data insights, etc.. Companies include Looker, Thoughspot, Adverity, Sisu, Sisense, etc.
Here some other trends that appeared too and were also mentioned by VCs:
- Acceleration of data operations: This is an interesting trend that is significantly growing. As data grows exponentially, a trend is emerging to help data operations specialists operate faster. This is a space hyperscalers are attempting to own but will they be able to do it themselves or through acquisitions? Few companies in this space seem to be able to make it past $40M ARR — this may be because they are tackling specific use cases or markets owned by hyperscalers, or get acquired. In this category are found companies like Matilion, Fishtown Analytics/DBT, Starburst, Dremio, etc.
- Data governance and lineage: VCs were excited by this area which is also concentrated due to the technicality of the subject. Cataloguing, governance and data lineage revolve around the understanding and mapping of the flows of data and metadata, and are often enablers to fields, such as AI intelligence, systems productivity, etc. One indeed needs to understand the flows of data to imagine technology improvements, or apply AI algorithms, or figure out how to scan or relate data. Companies includes Alation, Collibra, Manta, Immuta, Okera, etc.
- Data Prep/reliability: Data preparation and reliability was deemed an important topic. However, the capabilities were said to be often integrated into larger platforms, or the companies just get acquired quite fast. This may explain why the category has so far remained in the bottom tier of transaction sizes, with companies like Tamr, Trifacta or Scale.ai among others reaching interesting round sizes.
- Enablement of Conversational analytics: VCs mentioned that this is a category that is deemed and may still be promising. After all, there could be a next generation of data management and analytics solutions which could be fully enabled by bots and voice analysis. But overall, the space has taken time to grow and some VCs wonder if the market will be as large as one may think it was going to be.
- Monetization of data exchanges: There have been many discussions about this topic, and it seems there is consensus among the VCs and corporates that someday this will be a very large market with massive ROI. The question is when, as exchanging semi-anonymized data to monetize datasets is not so easy or widely done. Some large players have an offering (e.g. Snowflake), and it is question if smaller specialized players will emerge and whether corporates will actually really want to monetize their anonymized data, or see their data as their or their customers property. But the potential value is high, as anonymized data sharing may enable algorithms to find clues and deliver new insights.
That is all folks — so overall, data matters and is strategically important due to its complexity and importance. From extraction to delivery, it seems there is apparent simplicity but underlying complexity. Innovation is high to try to simplify complexity at all stages, from inputs to data management, AI insights and analytics and finally outputs. Analysis of +150 representative VC transactions over 2017–2020 revealed several trends: among others, and highlighted by VCs: data platforms may be converging (data warehouses and datalakes, AI data platforms, continuous intelligence are connecting their data assets), with data collaboration a priority; Process mining and RPA is attracting less capital but it may be because the winners have been chosen or a new wave of funding is preparing; Special purpose databases have seen increased interest for special data use cases; Connectivity, data integration and Workflow management is seeing increased interest and funding to facilitate data management, and Privacy Intelligence may have its best years to come and expand into adjacent areas, enabled by Governance, Cataloguing and Data Lineage. Finally, more specialized trends are continuing to shape the landscape, including Acceleration of data operations for data engineers (and at the same time democratization of data access to business users); or Data Preparation and Reliability. On the earlier side of the trends, VCs noted anonymization and synthetic data management, as well monetization of data exchanges as interesting coming trends.
The sum of it all: data matters; the market is moving fast, with significant funding, and increasing focus on data platform convergence, simplification of complexity and data collaboration! And under the apparent simplicity lies underlying complexity. And where there is complexity, there is room for improvements, ROI, VC funding and returns! 😊 Thanks to all for the contributions and for reading.