MSIS 5633 – BI Tools & Techniques (BITT) Data Mining Assignment II
MSIS 5633 – BI Tools & Techniques (BITT)
Data Mining Assignment II
In this assignment you will choose to use a free/open-source data mining tool, KNIME (http://
knime.org). You are to analyze a given dataset (about the voting behavior of a number of
counties in the U.S.) to develop and compare at least three different types of prediction (i.e.,
classification) methods that predicts weather a county will say “yes” or “no” to legalizing
gaming at the ballot). Here are the specifics for this assignment:
• Use the following tools
o KNIME (download and install on a PC/Laptop).
• Download “Voting Behavior” data and the brief data description from the D2L
o The data is given in MS Excel format.
• Follow the 6 steps in CRISP-DM process model
o Understand the domain and the problem you are trying to solve (via literature).
o Understand, and preprocess the data (be very critical about the data).
o Develop at least three classification models (e.g., Decision Tree, Logit, ANN, etc.).
o Compare the accuracy results (use confusion matrixes and comment on the outcome).
• Present your results in an organized report
o Include a cover page.
o Write an “Executive Summary” (1 page long).
o Use the 6 steps in CRISP-DM to organize the remainder of the report.
o Include a conclusion page, where you need to comment on the tool and techniques
you’ve used. What was good and what was bad, etc.
o Make sure to integrate figures (graphs, charts, tables, screen-shots) into the text as
you see necessary. Do not use Appendixes.
o Try not to exceed 15 pages in total, including the cover (use 12 point Times New
Roman fonts, and 1.5 line spacing).
Deadline:
• The report is due by Monday, October 23, 2017, 11:59 PM.
• The report should be uploaded as a single Microsoft Word Document to the Dropbox in
D2L by the due date/time.
GAMING BALLOT DATA DESCRIPTION
Data Description
The data is organized by “State No” and “County No”. These two fields are record identifiers.
The dataset contains 1287 unique records.
Variable Characteristics
| State No | Numeric | Primary Key Field |
| County No | Numeric | Primary Key Field |
| FOR | Numeric | Number of FOR votes |
| AGAINST | Numeric | Number of AGAINST votes |
| TOTAL CASTE | Numeric | Number of people voted |
| DEPENDENT VARIABLE | Binary Nominal | 1:Yes; 0:No |
| BALLOT TYPE | Binary Nominal | 1:Gambling; 2:Wagering |
| POPULATION | Numeric | Population of the county |
| PCI | Numeric | Per capita income |
| MEDIUM FAMILY INCOME | Numeric | Medium family income |
| SIZE OF COUNTY | Numeric | Size of the county (sq. mile) |
| POPULATION DENSITY | Numeric | Population density (# of people / sq. mile) |
| PERCENT WHITE | Numeric | Racial distribution of the county |
| PERCENT BLACK | Numeric | “ |
| PERCENT OTHER | Numeric | “ |
| PERCENT MALE | Numeric | Sex distribution of the county |
| PERCENT FEMALE | Numeric | “ |
| NO OF CHURCHES | Numeric | Religious identity of the county |
| NO OF CHURCH MEMBERS | Numeric | “ |
| PERCENT CHURCH MEMBERS OF POPULATION | Numeric | “ |
| POVERTY LEVEL | Numeric | Poverty level |
| UNEMPLOYMENT RATE | Numeric | Unemployment rate |
| AGE LESS THAN 18 | Numeric | Age distribution of the county |
| AGE24 | Numeric | “ |
| AGE44 | Numeric | “ |
| AGE64 | Numeric | “ |
| AGE OLDER THAN 65 | Numeric | “ |
| MSA | Binary Nominal | Metropolitan statistical area – 1:Yes; 0:No |
Guidelines/hints:
9 Be critical about the derived variables (e.g., percent whites, blacks, other; percent church
members of the population, etc.)… Make sure that they are calculated correctly.
9 Make sure that the data-formats of the variables are consistent with the DM tool you are
using (e.g., the nominal and numeric variables should be accurately defined).
9 Pick and choose your independent variables from the list. Do not use them all blindly!
9 Be specific about the actions taken during the data preprocessing; explain your data
preprocessing actions in a step-by-step fashion. Summarize the final status of the data.
9 Show the screen shots for the final data set (after the pre-processing and removal of
unused variables).
9 Show a screen shot of the complete classification model (Modeling Process in KIME)
– – –
-
- Assignment status: Resolved by our Writing Team
CLICK HERE TO ORDER 100% ORIGINAL PAPERS AT PrimeWritersBay.com GET THIS PAPER COMPLETED FOR YOU FROM THE WRITING EXPERTS
Comments
Post a Comment