MSIS 5633 – BI Tools & Techniques (BITT) Data Mining Assignment II

MSIS 5633 BI Tools & Techniques (BITT)
Data Mining
Assignment II
In this assignment you will choose to use a free/open-source data mining tool, KNIME (http://
knime.org). You are to analyze a given dataset (about the voting behavior of a number of
counties in the U.S.) to develop and compare
at least three different types of prediction (i.e.,
classification)
methods that predicts weather a county will say yes” or no to legalizing
gaming at the ballot). Here are the specifics for this assignment:
Use the following tools
o KNIME (download and install on a PC/Laptop).
Download “Voting Behavior” data and the brief data description from the D2L
o The data is given in MS Excel format.
Follow the 6 steps in CRISP-DM process model
o Understand the domain and the problem you are trying to solve (via literature).
o Understand, and preprocess the data (be very critical about the data).
o Develop at least three classification models (e.g., Decision Tree, Logit, ANN, etc.).
o Compare the accuracy results (use confusion matrixes and comment on the outcome).
Present your results in an organized report
o Include a cover page.
o Write an “Executive Summary” (1 page long).
o Use the 6 steps in CRISP-DM to organize the remainder of the report.
o Include a conclusion page, where you need to comment on the tool and techniques
you’ve used. What was good and what was bad, etc.
o Make sure to integrate figures (graphs, charts, tables, screen-shots) into the text as
you see necessary. Do not use Appendixes.
o Try not to exceed 15 pages in total, including the cover (use 12 point Times New
Roman fonts, and 1.5 line spacing).
Deadline:
The report is due by Monday, October 23, 2017, 11:59 PM.
The report should be uploaded as a single Microsoft Word Document to the Dropbox in
D2L by the due date/time.

 

GAMING BALLOT DATA DESCRIPTION
Data Description
The data is organized by “State No” and “County No”. These two fields are record identifiers.
The dataset contains 1287 unique records.
Variable Characteristics

State No Numeric Primary Key Field
County No Numeric Primary Key Field
FOR Numeric Number of FOR votes
AGAINST Numeric Number of AGAINST votes
TOTAL CASTE Numeric Number of people voted
DEPENDENT VARIABLE Binary Nominal 1:Yes; 0:No
BALLOT TYPE Binary Nominal 1:Gambling; 2:Wagering
POPULATION Numeric Population of the county
PCI Numeric Per capita income
MEDIUM FAMILY INCOME Numeric Medium family income
SIZE OF COUNTY Numeric Size of the county (sq. mile)
POPULATION DENSITY Numeric Population density (# of people / sq. mile)
PERCENT WHITE Numeric Racial distribution of the county
PERCENT BLACK Numeric
PERCENT OTHER Numeric
PERCENT MALE Numeric Sex distribution of the county
PERCENT FEMALE Numeric
NO OF CHURCHES Numeric Religious identity of the county
NO OF CHURCH MEMBERS Numeric
PERCENT CHURCH MEMBERS OF POPULATION Numeric
POVERTY LEVEL Numeric Poverty level
UNEMPLOYMENT RATE Numeric Unemployment rate
AGE LESS THAN 18 Numeric Age distribution of the county
AGE24 Numeric
AGE44 Numeric
AGE64 Numeric
AGE OLDER THAN 65 Numeric
MSA Binary Nominal Metropolitan statistical area – 1:Yes; 0:No

Guidelines/hints:
9 Be critical about the derived variables (e.g., percent whites, blacks, other; percent church
members of the population, etc.)… Make sure that they are calculated correctly.
9 Make sure that the data-formats of the variables are consistent with the DM tool you are
using (e.g., the nominal and numeric variables should be accurately defined).
9 Pick and choose your independent variables from the list. Do not use them all blindly!
9 Be specific about the actions taken during the data preprocessing; explain your data
preprocessing actions in a step-by-step fashion. Summarize the final status of the data.
9 Show the screen shots for the final data set (after the pre-processing and removal of
unused variables).
9 Show a screen shot of the complete classification model (Modeling Process in KIME)
– – –



Logo CLICK HERE TO ORDER 100% ORIGINAL PAPERS AT PrimeWritersBay.com GET THIS PAPER COMPLETED FOR YOU FROM THE WRITING EXPERTS Image result for order now NO PLAGIARISM

Comments

Popular posts from this blog

Explain the communication process that applies to advertising and promotion

Text documents, such as long recordings and meeting transcripts, are

Human resources are of great