Objectives:
• To get familiar with the pandas library
• To learn how to analyze data in Python
Data:
• Download the house_price Excel file from https://github.com/xbwei/machine_lear...
• Upload the hosue_price Excel file to your S3 bucket.
Steps:
1. Start the Notebook Instance in SageMaker, and create a new notebook, name it lab12.
2. Use pandas to load the house_price data into Python and answer the following questions. Use string format to report the results of 2.3 –2.4, 2.6– 2.7.
2.1. Calculate the unit price of all houses, e.g., price/area, and display the house price table's top 10 rows.
2.2. Calculate the number of records per house type, display all the results.
2.3. Calculate the average price of houses that have more than two bathrooms.
2.4. Calculate the median and mean unit price.
2.5. Calculate the average unit price per house type, display all the results.
2.6. Establish a simple linear regression model to predict the price by the area. Report the slope, intercept, R square, and p-value.
2.7. Based on your model, predict the price of a house with an area of 2,000 sqft.
3. Close your Notebook, open the JuypterLab. In JupyterLab, upload your Notebook to your GitHub.
4. Check the uploaded Notebook in GitHub.
5. Stop the Notebook instance after you submitted your lab on Canvas.