Saturday, April 18, 2020

Big data and related technologies...

 Big Data usually has three characteristics. It is a large amount of data that increasingly requires more storage space (volume), that is growing exponentially fast (velocity), and that is generated in different formats (variety).


Fog computing is an architecture that utilizes end-user clients or “edge” devices to do a substantial amount of the pre-processing and storage required by an organization. Fog computing was designed to keep the data closer to the source for pre-processing.

The cloud is a collection of data centers or groups of connected servers giving anywhere, anytime access to software, storage, and services using a browser interface. Cloud services provide increased data storage as required and reduce the need for onsite IT equipment, maintenance, and management. They also reduce the cost of equipment, energy, physical plant requirements, and personnel training needs.

Distributed data processing takes large volumes of data from a source and breaks it into smaller pieces. These smaller data volumes are distributed in many locations to be processed by many computers with smaller processors. Each computer in the distributed architecture analyzes its part of the Big Data picture.
Businesses gain value by collecting and analyzing massive amounts of new product-usage data to understand the impact of their products and services, adjust their methods and goals, and provide their customers with better products faster.

Collected data can be categorized as structured or unstructured. Structured data is created by applications that use “fixed” format input such as spreadsheets or medical forms. Unstructured data is generated in a “freeform” style such as audio, video, web pages, and tweets. Both forms of data need to be manipulated into a common format to be analyzed. CSV, JSON, and XML are plaintext file types that use a standard way of representing data records. Converting data into a common format is a valuable way to combine data from different sources.

Data mining is the process of turning raw data into meaningful information by discovering patterns and relationships in large data sets. Data visualization is the process of taking the analyzed data and using charts such as line, column, bar, pie, or scatter to present meaningful information. A strategy helps a business determine the type of analysis required and the best tool to do the analysis. A strategy also helps to determine the most effective way to present the results for management.