For an ml model to work really well, the data that you are going to provide to the model should be of high quality.
Now, in this one, we will try to understand what makes a data as a high quality data.
Complete Machine Learning Course for FREE: • Foundation of Machine Learning (The B...
Join ML+ membership for exclusive Data science content
Checkout complete Data Scientist Learning Path here: https://edu.machinelearningplus.com/s...
🔹 Is Garbage in Garbage out Still Important in ML?
That is what is the difference or the distinguishing factor between a poor quality data versus a high quality data. Let's understand that in this one. Now, there are so many complex or practical real world problems that you can use machine learning to solve.
For example, just the case that we just saw, you can use machine learning to classify a given email as spam or ham that is one case. Another case could be you can use it to predict if a given tissue mass is a malignant or benign tumor.
Another case could be you can use machine learning to estimate the crop yield and a given plantation there is also a case. You can also use it to flag if a given transaction is a fraudulent transaction or a genuine transaction. These are few successful examples of real world situations where machine learning is actually used.
Like I said, machine learning can be used in so many complex real life situations. However, it requires high quality data. But what is high quality? Let's understand, we will understand this based on real life example, where you want to predict the price of a house, given various different types of data. But what are those types of data that you will need? Let's imagine you are working as a data scientist for a real estate brokerage firm.
Now, the firm is asking you to develop a machine learning model that can predict the price of the house given various different characteristics of the house. Now as a data scientist, you want to come up you want to ask back to the firm, I will need someone so data about a given house.
And with such data, you will be able to build a machine learning model that can predict the price of a house. Given the situation what sort of information or data would you want to collect in order to predict the price? Well, in this case, even though you are a data scientist, you will have to think like a real estate agent, because a real estate agent is the domain expert in this particular situation.
Now, what are the different factors a real estate agent will look for, it will look for, say the area of the house, the location, any schools nearby, what is the width of the road in front of the house like this, there are so many other parameters that could be collected. and that in turn will actually determine how much a given house is worth.
Now, let's assume all this information that has been collected here in this case, all these are useful information. But it might also be the case where say your firm is collecting certain amount of data about about real estate about your locality. say they are collecting how many ice cream parlors are present in around the house.
Let's imagine right? If such data is present, there could be other data also, if your firm is having a lot of data, but none of those data are actually useful in predicting the house price. Such a data is not going to help us explain the price of a given house. In such case, we call this garbage whereas this data will be very useful data.
Let me know in the comments section if you have any questions!
🤝 Like, Share, Subscribe for more!
Follow us on our social media handles for all updates, events and live sessions-
✅ Instagram: / machinelearningplus
✅ LinkedIn: / machine-learning-plus
✅ YouTube: / numyard
✅ Twitter: / r_programming
✅ Website: https://www.machinelearningplus.com/
If you enjoyed this video, be sure to throw it a like and make sure to subscribe to not miss any future videos!
Thanks for watching!
#machinelearningplus #python #machinelearning #datascience