What actually is DATA in ML?
what data is all about in machine learning and how different steps are involved in
So basically what is DATA all about -
anything that is recorded is data. Observations and facts are data. Anecdotes and opinions are also data, of a different kind. Data can be numbers, like the record of daily weather, or daily sales.
So in ml, the precision of any analysis increases as data becomes more numeric, so what we actually mean by that is, let’s take an example the data for weather about temperature, pressure, and humidity can be used to create rigorous mathematical models that can accurately predict future weather. so all we need here is some kind of numerical values that will be helpful in predicting the weather.
(so this flowchart explains to us the whole data processing chain we can follow in ML predictions )
- Collection of data, that can be accessible in many ways
- It’s an active collection of data (that is it can be updated or changed)
- The data model abstracts the key entities involved in an action and their relationships.
- Most databases today follow the relational data model and its variants.
- Each data modeling technique imposes rigorous rules and constraints to ensure the integrity and consistency of data over time.
- It’s a collection of data from all the organizations from the past to the present.
- The data in the warehouse isn’t updated in a short span of time as compared to databases like for example-(the transitions data of a bank from day -1 to the present day is a pretty large collection of data).
- It’s way larger than databases.
- And way more detailed than databases.
- generates a pattern out of the given data.
- So the types of attributes we choose depend on the outcome we are wanting
- so that pattern generation is a smarter way of predicting data more accurately.
- graphical representation of the data we have
- to make the eaasily readable and better to understand we use visualization.