Course materials Github: https://github.com/machinelearningplu...
Join Pandas course on ML+: https://edu.machinelearningplus.com/c...
--------------------
Once you have loaded the data frame into pandas, you want to select certain specific data from your data frame that is you want to select few columns of interest, a few rows and few columns of interest.
For example, you might want to select the first 100 rows of say, the state area code and phone number, area code and phone number of these three columns alone, you want to select the first 100 rows, that could be one scenario. Or you might also want to select all the columns from this data frame except leaving out a few columns. Like this, there might be many other scenarios as well, whatever be the condition, whatever be the complexity of the condition, it is possible to do all sorts of data subsetting using Pandas, and the syntax to do that in pandas is very simple and straightforward, very intuitive.
Now when it comes to syntax for selecting the data, there are four main notations or functions or methods of data frame. First one is the dot notation, we will see all of these now the dot notation, the dot LOC method, the dot iLok, method, and dot i ad and dot add. All these are the different methods for selecting the specific rows and columns from your data frame.
Let's look at them one by one. First one the dot notation here, using the dot notation, you can select only one column at a time, it is a very simple, very straightforward syntax, just like how we would call the different methods of a data frame. If you just call DF dot the name of the column, you can just select that column. The disadvantage, however, is if your column name contains a space, this won't work. Secondly, you can only select one column at a time, it is not possible to select more than one column using the dot notation, right, this also there.
The third disadvantage is you can only select a column you cannot assign a new value to this column. For instance, if you try to assign, say a new column called state or state number, something like that, whatever the name, if you want to assign some value to this new column, this won't work this this particular new name for the column, this particular column will not appear in a data frame.
But this will not give you an error as well, what will happen is it will internally create an attribute within the data frame itself. Let me show you with an example. Let's run the code over here. So this is dF dot state. Now if I'm going to try and create DF dot state underscore num equal to one, all right now have created a new column called DF underscore state. But did it actually create a new column? Let's find out.
We'll all over to the right hand side, you will still see that the last column is June, there is no new column created. On the other hand, if you check the attribute of DF, you will definitely find this state num preset inside it. And this attribute holds the value one. So instead of creating a new column, it is creating an attribute and that attribute is stored inside the data frame itself. So this is not a way of creating a new column at all. This is a mistake people commit when you're learning Pandas for the first time, so be careful about this.
And of course, this particular object, DF dot state is going to be coming out as a series only because we are selecting one single column. Alright, so that is dot notation. Let's look at DOT lock. Now dot lock is one of my favorite ways of selecting objects from a data frame. I use either dot lock, this is my first go to function. Either this or I will use dot I lock we will look at a lock subsequently in this video itself. But understand this is very similar.