Data Management Glossary
Unstructured Data AI
Unstructured data is the fuel for Artificial intelligence (AI) and there is growing demand to use AI and machine learning techniques to analyze, process, and derive insights from unstructured data. Unstructured data is data that doesn’t have a predefined schema or organized format, such as:
- Text: Emails, social media posts, chat logs, documents.
- Images: Photographs, scanned documents, and graphics.
- Audio: Voice recordings, podcasts, and call recordings.
- Video: Surveillance footage, movies, or user-generated content.
- Sensor Data: Logs from IoT devices without a clear structure.
Most of this unstructured data is storage as files and objects in the enterprise.
Read: Unstructured Data Growth and AI are Changing Executive Decision Making.
AI Applications in Unstructured Data
Here are some examples:
Natural Language Processing (NLP):
- Sentiment analysis on social media or reviews.
- Chatbot development for automated customer support.
- Summarizing or translating text content.
Computer Vision:
- Image recognition for tagging photos or medical imaging diagnostics.
- Video analysis for facial recognition or surveillance.
Speech Recognition:
- Transcribing spoken words into text.
- Enhancing virtual assistants like Alexa or Siri.
Predictive Analytics:
- Identifying patterns in unstructured logs or communication data.
- Forecasting trends based on textual or visual insights.
Recommendation Systems:
- Using text reviews and user-generated content to suggest products or services.
Knowledge Extraction:
- Extracting actionable information from documents, reports, or multimedia data.
AI Technologies for Unstructured Data
- Deep Learning: Particularly neural networks like CNNs for images and RNNs/transformers for text.
- Transformers Models (e.g., BERT, GPT): Used for advanced text generation, classification, or summarization tasks.
- OCR (Optical Character Recognition): Converts images of text into machine-readable formats.
- Audio Processing Models (e.g., WaveNet): Analyze audio signals for transcription or sentiment analysis.
Challenges - Data Cleaning and Preprocessing: Handling noise, inconsistencies, and errors in raw data.
- Scalability: Managing large datasets, e.g., video archives or massive text corpora.
- Interpretability: Making AI outputs understandable and actionable.
- Integration: Combining structured and unstructured data for holistic insights.
AI for unstructured data is becoming increasingly critical, as 80-90% of data generated today is unstructured, according to IDC. Tools like OpenAI’s models, Google Cloud AI, and AWS AI services are instrumental in enabling businesses to leverage unstructured data effectively.
Unstructured Data Management and AI
At the end of 2024, Komprise CEO and cofounder Kumar Goswami made the following predictions for AI and data:
- IT leaders will get creative to deploy AI on a budget (see the survey)
- Unstructured data governance processes for AI will mature
- Systematic data ingestion for AI will be the first data storage mandate
- Hybrid cloud persists, mandating deep intelligence on data and costs
- Role of storage administrator evolves to embrace security and AI data governance
He noted:
AI mania is overwhelming, but so far, enterprise participation has been largely led by employees who are using GenAI tools to assist with daily tasks such as writing, research and basic analysis. AI model training has been primarily the responsibility of specialists, and storage IT has not been involved with AI. But this will change swiftly in the coming year. Business and public sector leaders know that if they get left behind in the AI Gold Rush, they may lose market share, customers and relevance. Corporate data will be used with AI for retrieval augmented generation (RAG) and inferencing, which will constitute 90% of AI investment over time. Everyone touching data and infrastructure will need to step up to the plate as a broader set of employees start sending company data to AI. Storage IT will need to create systematic ways for users to search across corporate data stores, curate the right data, check for sensitive data and move data to AI with audit reporting. Storage managers will need to get clear on the requirements to support their business, departmental and IT counterparts.