Metadata knowledge graphs and data integration future

Metadata knowledge graphs and data integration future

Unpacking the next data platform is a crucial process in the constantly changing world of data and artificial intelligence. It involves understanding metadata knowledge graphs and how different layers of the modern data stack come together.

If one wants to do anything with data, they need a stack of tools to get it done. The stack has not changed even with all of the innovation that’s been happening in the data industry, according to Gaurav Pathak (pictured, right), vice president of product management AI and metadata at Informatica Inc.

George Fraser of Fivetran and Gaurav Pathak of Informatica talk with theCUBE about metadata knowledge graphs.

“Players have changed. But that stack, moving the data from raw data to really processed insight, has remained quite similar with [metadata knowledge graphs],” Pathak said. “We are looking at triples, we are looking at relationships between individual metadata objects.”

Pathak and George Fraser (left), chief executive officer of Fivetran Inc., spoke with theCUBE Research’s Rob Strechay and George Gilbert at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the evolving data stack and the role of metadata knowledge graphs.

Metadata knowledge graphs enable more action

Informatica collects technical, business, operational and usage metadata about data assets, according to Pathak. That also involves collecting information such as schema and structures about what data looks like in Snowflake and Databricks.

“We look at how is that pipeline created, what are the transformations, how [are] all of these things related to each other?” he said. “Having that triple, having that metadata knowledge graph, then allows you to now start doing, both human-wise and AI-wise, intelligence queries to the data ecosystem itself.”

Companies can ask questions as a result of that, according to Pathak. Those questions could include how many Iceberg tables a company has.

“How many of them are used by people in [the] marketing department? And how many of them are compliant with GDPR?” Pathak said. “Their data is not moving from one jurisdiction to another. These kind of questions are really, really hard to get early on. But with metadata knowledge graphs, with catalogs like these, these are now possible.”

It’s understood now that metadata has been centralized, things such as usage, consumption, financial management and governance are much easier to manage. There are a lot of new workloads happening right now, according to Fraser.

“That’s one of the big phenomenon that we’re seeing, is customers are doing more new workloads with their data. From a Fivetran perspective, that means new data types,” he said. “It means there are things that previously didn’t belong in the central data estate, now belong there. Mostly freeform text stuff. Fivetran has had connectors to systems like Zendesk and Slack for many years that have freeform text, but there’s a whole new emphasis on those systems.”

AI demands diverse, fresh data sources

Beyond AI’s evolution tied to freeform text, there’s also an evolution tied to a demand for more diverse sources of data, according to Fraser. The other point of evolution has to do with latency.

“Some of these more operational type of workflows that people want to do with AI agents and things like that, they require fresher data,” Fraser said. “The first Fivetran pipeline 10 years ago ran once a day. And now, the milestone we’re trying to get to is where we can reliably do one-minute latency for all data sources.”

Fivetran considers all these to involve workloads that run on the data it delivers. That evolution means more sources and new entities within existing sources, according to Fraser.

“It maybe means more adoption of data lakes as the compute engine people want to use to power some of these new workloads is maybe one that doesn’t even exist yet,” he said. “Those are the main, I think, evolutionary pressures that we are feeling from the data pipeline perspective.”

These technologies are extremely powerful, according to Pathak. There are more changes on the horizon to come, too.

“What will change is that people have thought about code as something that needs to be maintained pristine. It has to be taken in for a long time. There was a whole ecosystem around it,” he said. “But if you have gen AI systems that can convert English into natural language statements and then take decisions on what’s the right formats, what are the right models to store that data in, I think that will be a very different world that we will live in.”

Stay tuned for the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event.

Photo: SiliconANGLE

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Originally Appeared Here