Komprise Intelligent AI Ingest is a new workflow and ingestion engine that speeds the curation of the right unstructured data across disparate storage silos for AI. Part of Smart Data Workflows, Intelligent AI Ingest boosts AI ROI by eliminating the noise and high cost of using unstructured data in RAG and LLM pipelines.
Key Komprise Intelligent Ingest features include:
• Precise curation for RAG: The Komprise Global File Index delivers a surgical approach with rich filters to find just the data you need, unlike traditional ETL and data ingestion approaches that provide connectors to blindly copy data from a source.
• 2X ingestion performance improvement: This 2x data transfer speed, benchmarked against a data transfer tool from a major cloud provider, is possible due to a purpose-built transfer engine that minimizes file overhead for AI.
• High performance parallel architecture: The Komprise elastic grid architecture is parallelized in layers across multiple network interfaces, share engines, and thread pools. This allows the solution to index and enrich metadata rapidly across billions of files and move large volumes of files to different AI tools and services as needed.
• Built-in Data Security Classification & Remediation: Komprise provides standard and custom sensitive data classification so you can reduce the risk of PII and custom sensitive data leakage and avoid compliance violations.
• Automated data governance auditing: Komprise automatically maintains an audit trail of each ingestion workflow for data governance and auditing, documenting the who, what, when and data lineage for compliance reporting.
Why is Komprise introducing AI data ingestion now?
There have been recent surveys indicating that ROI from AI projects is low. One reason for the low ROI is that too much irrelevant, outdated and poor-quality data is being fed to AI. Ingesting the right data to AI is vexing with unstructured data because this data is scattered in many places and has been piling up without proper curation for quality. As enterprises start using AI widely, they need a systematic approach to ingesting the right unstructured data for AI to improve accuracy and reduce costs. Komprise helps with these issues.
How will customers benefit from Komprise Intelligent AI Ingest?
First, they can be more efficient and cost effective with unstructured data preparation for AI. The Komprise Global File Index provides a structured way to search globally across all your unstructured data and find precisely the right data for your AI use case. Komprise then ingests this data continuously to the AI of your choice, such as the Azure CoPilot Knowledge Base or Amazon Bedrock Knowledge Bases.
Komprise Intelligent Ingest for AI vastly reduces processing costs and time, rather than the default of blindly copying large volumes of unstructured data to AI.
Meanwhile, contextual curation improves the accuracy of AI results. Our precise filtering, including a built-in sensitive data detection and PII scanner, also eliminates the risk of sensitive data ingestion to AI. Finally, we do this twice as fast than industry standard transfer tools, speeding time to value for AI pipelines.
What are the use cases or scenarios for “delivering the right data to AI”?
Any generative AI use case, any use of RAG or LLM, and any AI inferencing use case is applicable as they all need to be fed data. For instance: building a chatbot that answers questions based on your data, offering Office 365 CoPilot for your employees, or writing AI agents for support are use cases that require proper AI data ingestion and can benefit from Komprise.
_______________________
How exactly does having a precise curation strategy for AI save money?
AI consumes compute and storage resources for every operation; the more data you give it, the more data it needs to process for each answer. If 70% of this data is irrelevant, you are spending 70% more processing power on irrelevant data that you could save through precise curation. More importantly, your results could be less accurate when you give too much data because of the risk of incorrect or outdated information. You would incur this processing wastage regardless of whether you run the AI in the cloud or in the data center. Data curation eliminates this waste of resources on irrelevant data.
How is Komprise different from ETL or copy and sync approaches to ingestion?
Unlike ETL tools which are focused on structured data and do not provide a way to curate across unstructured data silos, Komprise addresses the scattered nature, volume and poor data quality of unstructured data. Copy-and-sync approaches simply bring everything from a source to a destination. These tools don’t eliminate the data noise and may propagate poor data quality and bad AI outcomes.
Furthermore, tests show Komprise runs twice as fast as copy-and-sync solutions because of its optimized AI-ingestion engine that cuts out file data overhead.
When would a large enterprise need multiple ingestion solutions or methods for AI?
Organizations that have both structured and unstructured data will typically use different ingestion methods for the two. Current ingestion approaches for unstructured data are limited to processing data or connect and ingest all data from a given source. They lack the global capability to search across data estates, curate the right data and ingest it systematically.
Are IT leaders properly focused on the AI ingestion problem?
There are two distinct stages of AI: model development and inferencing. Much of the effort currently is going towards model development. But some organizations are starting to deploy AI inferencing and are finding that AI is not generating the ROI they would expect. This is because without proper data preparation, especially for unstructured data, you get bad results with poor quality data. As more organizations are starting to recognize this problem, they see the need for unstructured data classification, preparation and curated ingestion.
What new skills by IT pros are needed to deliver secure, accurate AI pipelines with unstructured data?
Traditionally, storage IT has been focused on the infrastructure and not on the data. With AI, reuse of data is becoming paramount and this causes storage IT to expand their role into delivering data services. Essentially, they are now evolving to provide ways to classify data, identify sensitive data, tag data based on file contents and help departments and security teams use the right data for AI. While Komprise automates most of these tasks, storage teams are expanding their role to interact more with security, data loss prevention, AI and data science teams to deliver these data services.
Do you have any other best practices to share for improving the data ingestion process?
Creating specific AI agents for specific use cases will become important so that you can tailor the data to the needs of that use case. Rather than having a generic AI chatbot, consider creating a support chat agent, an employee HR chat agent, etc. Now you can segregate and ensure the right data reaches each of these AI workflows. It becomes a lot easier to design, measure and monitor each AI workflow when you are granular.
What are partners saying about Komprise Intelligent AI Ingest?
In an interview with CRN, Cesar Enciso, CEO, Chairman, and founder of Evotek, a San Diego, California-based solution provider notes: “To us, there’s a lot of risk in unstructured data. And so we were an early adopter of Komprise…Komprise Intelligent AI Ingest helps businesses start to lean into the AI revolution. Our customers are asking us, ‘How do we scale AI responsibly?’ And what we like about Komprise right now is they solve a key challenge by ensuring only the right unstructured data makes it to the AI pipeline, which helps with accuracy and reducing risk.”
“There are a lot of companies that are ingesting a bunch of noise from an AI standpoint. It’s just not productive and it’s expensive. The other part about AI is, a lot of people are bringing their own AI tools. How do we secure properly? How do we make sure that this doesn’t become a security risk? And that’s why we use a tool like Komprise.”
What does Gartner say about GenAI and Modern Data Storage Management Services (DSMS)?
The Komprise Intelligent AI Ingest press release, includes a quote from the Gartner®, Market Guide for Data Storage Management Services (subscription required):
“As organizations accelerate their journey toward becoming data-driven, DSMS solutions are evolving into intelligent platforms that do far more than manage storage, Modern Data Storage Management Services (DSMS) solutions are foundational to business analytics and generative AI (GenAI) initiatives, helping enterprises unlock the full value of their data by making it more discoverable, contextualized and actionable.”

