Focus is placed on AI feature engineering as DataStax acquires Kaskada

Posted On: January 13, 2023

Private AI vendor Kaskada, which creates a feature engineering platform to aid businesses in using data for AI applications, was acquired by DataStax today.

Natural progression from solid data, often kept in a database for querying, is the first step toward successful machine learning (ML) and artificial intelligence (AI). Effective ML and AI also rely on event streaming data sources, which provide real-time data to stream from a variety of places.

DataStax, a provider of databases and real-time streaming services, has been expanding its data platform since 2010 and is a major contributor to the open-source Apache Cassandra database. Kesque, a provider of the Apache Pulsar open source project, was bought by DataStax in 2021, and the company thereafter introduced its streaming data service. DataStax has been successful because to rising interest in database and event streaming, and the business just announced a $115 million fundraising round.

Demand for AI and ML, enabled by a real-time data platform, will play a significant role in the company’s future expansion.

“Machine learning is transformative to businesses, and it has to be something that you leverage daily in your business processes and in your applications,” Chet Kapoor, CEO of DataStax. “We think that we can make it possible for all types of customers to overlay AI pipelines to make it part of their business apps and business processes.”

Artificial intelligence (AI) is about more than simply raw facts

The majority of the excitement around contemporary AI is centred on applications that make use of unstructured data.

True, most text and picture generative AI tools prefer unstructured input, but this is not the case for other AI workloads.

While speaking with VentureBeat, DataStax’s chief product officer Ed Anuff emphasised the importance of structured data and AI for a variety of use cases, including package delivery, logistics, ride sharing, and video streaming. Organizations are using tabular, structured data formats in those fields to keep track of event-based data as interactions occur or as locations change.

The bulk of apps we use on a regular basis where ML is truly being utilised to make our interactions more effective are the structured data use cases, according to Anuff.

The Apache Cassandra database operates best with structured information. In order to keep their businesses running, companies like Uber and Netflix employ the database management system Cassandra. Feature engineering is the technique of leveraging structured data stored in Cassandra to train AI models.

Here’s what Kaskada gives DataStax and the Apache Cassandra database

DataStax believes that the feature engineering technology created by Kaskada will be a perfect complement to its own real-time data platform.

According to Anuff, Kaskada has developed a short description language that makes it easy for a data engineer to define the characteristics of a dataset required to fuel an AI model. He continued by saying that the Kaskada technology supports the high throughput required for real-time software.

DataStax’s goal is to function as part of an ML pipeline, providing the data foundation and feature engineering needed to fuel AI inference engines. Anuff stressed the two-way nature of the data flow, with the results of AI inference being put back into Cassandra and delivered to application users.

Kapoor’s ultimate objective is to facilitate the utilisation of operational data to enhance business results via the deployment of a real-time data stack.

According to Kapoor, “our clients have an abnormally large quantity of real-time data,” which can be used to “create fantastic experiences for their consumers.”

Catherine A. Leal

Subtly charming pop culture geek. Amateur analyst. Freelance tv buff. Coffee lover

Recent Posts