project3-Copy1 DATA MISC|VERY HELPFUL
project3-Copy1 December 31, 2018 1 Project 3 - Classification Welcome to the third project of Data 8! You will build a classifier that guesses whether a movie is romance or action, using only the numbers of times words appear in the movies’s screenplay. By the end of the project, you should know how to: 1. Build a k-nearest-neighbors classifier. 2. Test a classifier on data. 1.0.1 Logistics Deadline. This project is due at 11:59pm on Friday 11/30. You can earn an early submission bonus point by submitting your completed project by Thursday 11/29. It’s much better to be early than late, so start working now. Checkpoint. For full credit, you must also complete Part 1 of the project (out of 4) and submit it by 11:59pm on Friday 11/16. You will have some lab time to work on these questions, but we recommend that you start the project before lab and leave time to finish the checkpoint afterward. Partners. You may work with one other partner; this partner must be enrolled in the same lab section as you are. Only one of you is required to submit the project. On , the person who submits should also designate their partner so that both of you receive credit. Rules. Don’t share your code with anybody but your partner. You are welcome to discuss questions with other students, but don’t share the answers. The experience of solving the problems in this project will prepare you for exams (and life). If someone asks you for the answer, resist! Instead, you can demonstrate how you would solve a similar problem. Support. You are not alone! Come to office hours, post on Piazza, and talk to your classmates. If you want to ask about the details of your solution to a problem, make a private Piazza post and the staff will respond. If you’re ever feeling overwhelmed or don’t know how to make progress, email your TA or tutor for help. You can find contact information for the staff on the course website. Tests. Passing the tests for a question does not mean that you answered the question correctly. Tests usually only check that your table has the correct column labels. However, more tests will be applied to verify the correctness of your submission in order to assign your final score, so be careful and check your work! Advice. Develop your answers incrementally. To perform a complicated table manipulation, break it up into steps, perform each step on a different line, give a new name to each result, and check that each intermediate result is what you expect. You can add any additional names or functions you want to the provided cells. Also, please be sure to not re-assign variables throughout 1 the notebook! For example, if you use max_temperature in your answer to one question, do not reassign it later on. To get started, load datascience, numpy, plots, and ok. In [3]: # Run this cell to set up the notebook, but please don't change it. import numpy as np import math from datascience import * # These lines set up the plotting functionality and formatting. import matplotlib ('Agg', warn=False) %matplotlib inline import t as plots ('fivethirtyeight') import warnings efilter(action="ignore", category=FutureWarning) # These lines load the tests. from ook import Notebook ok = Notebook('') _ = (inline=True) ===================================================================== Assignment: Project 3 - Classification OK, version v1.12.5 ===================================================================== Successfully logged in as 2 1. The Dataset In this project, we are exploring movie screenplays. We’ll be trying to predict each movie’s genre from the text of its screenplay. In particular, we have compiled a list of 5,000 words that occur in conversations between movie characters. For each movie, our dataset tells us the frequency with which each of these words occurs in certain conversations in its screenplay. All words have been converted to lowercase. Run the cell below to read the movies table. It m
Written for
- Institution
- University Of California - Berkeley
- Course
- DATA MISC
Document information
- Uploaded on
- November 19, 2022
- Number of pages
- 37
- Written in
- 2022/2023
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
berkeley data misc
-
project3 copy1unhelpful university of california