How to reduce memory space of DataFrames | #43 of 53: The Complete Pandas Course

Опубликовано: 05 Октябрь 2024
на канале: Selva Prabhakaran (ML+)

422

In this Part, we will see How we can Memory and Space while working on a Dataset in Python Pandas.

With the use of Sparse Datatype we can save the memory as well as Space. Let's see how it works.

Complete PANDAS COURSE for FREE: http://surl.li/cectw

Join ML+ membership for exclusive Data science content

Checkout complete Data Scientist Learning Path here: https://edu.machinelearningplus.com/s...

🔹 Tips and Tricks on Saving Memory and Space in Python Pandas.

If you are going to be working with a column that contains a lot of missing values, or zero values. So now one thing to be aware of before you import your columns, you need to be considerate about what columns in the original source file that you are going to need for your analysis.

Import only those columns instead of importing the entire data set. So for example, the large data set dot CSV, this contains a lot of columns inside it. But in case you are going to use only a subset of the columns, specify the names of that column that you are going to use, and pass it as an argument to use calls.

This will save a lot of memory when you're working. So all the subsequent operations that you're going to do will be running faster. So let's do that. We are defining the columns that we want. Run use columns. And now look at data frame dot info, this is consuming only 22.9 previous video, we saw that this is consuming more than 150 Mb, this is a significant change.

Now one other thing to notice the has TPM this column has the BM column is taking values either one or zero, right only two values it's needing right. But if you look at the data type of hash TPM, this is stored as in 64.

That is not at all required, you can convert this hash DPM to a boolean data type instead. So when importing itself mentioned use columns equal to columns, and specify the data type of has to be in the appropriate columns data type to whatever is necessary, in this case has to be a Boolean is sufficient. on doing this, let's look at the memory consumed 22 became 17.2 Mb, this is significant.

Alright, so that is one other improvement that you can do. Third improvement is the use of sparse data type certain columns, the values are either going to be missing or is going to contain a lot of zeros. Let's look at the output of this.

Let me know in the comments section if you have any questions!

🤝 Like, Share, Subscribe for more!

Follow us on our social media handles for all updates, events and live sessions-

✅ Instagram:   / machinelearningplus

✅ LinkedIn:   / machine-learning-plus

✅ YouTube:    / numyard

✅ Twitter:   / r_programming

✅ Website: https://www.machinelearningplus.com/

If you enjoyed this video, be sure to throw it a like and make sure to subscribe to not miss any future videos!

Thanks for watching!

#machinelearningplus #python #pandas #datascience