In the emerging economy there is a new infrastructure, based on the internet, that is causing us to scrutinies most of our assumptions about the business. As a skin of networks - growing in ubiquity, robustness, bandwidth, and function - covers the skin of the planet, new models of how wealth is created are emerging.

Wednesday, July 5, 2023

Unstructured Web Data Mining

A webpage is composed of various elements such as text, images, audio, video, metadata, hyperlinks, and structured records or tables. It can be classified into different types: unstructured data, structured data, semi-structured data, and multimedia data. Unstructured data lacks a predefined internal structure based on data models and can be generated in either textual or non-textual formats by humans or machines.

Key characteristics of unstructured documents include:
~Representing features as a bag of words or phrases.
~Features can be boolean or frequency-based.
~Various techniques can be applied to reduce features through feature selection.
~Word stemming allows the combination of morphological variations into a single feature.

Web mining is a technique utilized to discover and analyze new patterns or previously unknown knowledge from data related to the web. Unstructured data mining involves examining relatively unstructured data and aiming to obtain more refined datasets from it. This often entails extracting data from sources not traditionally used for data mining purposes.

Processing and managing substantial unstructured data is crucial to derive meaningful insights. Data mining is a process that employs diverse tools and techniques to convert both structured and unstructured data into valuable insights. Web mining poses challenges due to the heterogeneous nature and lack of structure in web resources.

Web mining entails generating significant numerical indices from unstructured text and processing these indices using various data mining algorithms.

In general, data mining involves sifting through datasets and extracting the most valuable information in a specific format. Unstructured data is processed by transferring it to a data lake through Extract, Load, and Transform (ELT) processes.

For instance, in the context of a letter, data mining would involve breaking down the letter and extracting specific identifiers and details, such as the names of related parties and the dates the letters were sent. This mined data is subsequently transformed into a format that businesses or other parties can utilize for quick reference or the development of business intelligence applications.
Unstructured Web Data Mining

The most popular articles

My Blog List