Select Page

This case is about to analyze the death related to drug, through the data you need to conclude what will effect the death most in a rank list. The format should be like the paper I upload below. Data is also in the file.

Don't use plagiarized sources. Get Your Custom Essay on
Accidental Drug Related Deaths In Connecticut Statistics Case Study
Just from $10/Page
Order Essay

Unformatted Attachment Preview

Anurag Shrivastava , Jyoti Sondhi , Bharat Kumar
Asso. Professor &Head, Department of Computer Science& Engineering, NRI Institute of research &Technology, Bhopal
Assistant Professor, Department of Computer Science& Engineering, NRI Institute of research &Technology, Bhopal
M.Tech Scholar, Department of Computer Science& Engineering, NRI Institute of research &Technology, Bhopal
Abstract— Rapid growth of online shopping and
marketing increase the field of e-commerce, which got
boosted rapidly with the application of machine
learning. Machine Learning (ML) has empowered
businesses by finding useful patterns from customers’
search patterns and buying behaviours on web.
Predictive analytics based on machine learning can
enhance sale probability and find customer churn by
analyzing customers past click-through behaviour,
purchases, and history in real time. Applying popular
the traditional ML libraries do not support well
processing of large datasets, so new approaches and
platforms are needed. Cloud computing paradigm
turned out to be valuable alternatives to speed-up
machine learning platforms. The paper, first discusses
the importance of machine learning in predictive
analytics. The need of cloud platforms to analyze ever
growing E-commerce data is briefly surveyed then.
Finally, the work proposes a framework to predict the
Product Category in a large E-commerce dataset having
9 categories and 93 features of products (like
electronics, fashion, etc.). The dataset is released by a
famous e-commerce company for a competition. The
classifier is build which is based on ‘Multiclass Decision
Forest’ Machine Learning Algorithm and is deployed on
Microsoft’s Azure Machine Learning (Azure ML)
platform. Azure ML is public cloud platform. The
results obtained by proposed model are evaluated in
terms of accuracy and the comparison is done with
benchmarks provided by competition administrators.
The results obtained are promising and the paper also
directs the future research work in the field.
Keywords— E-commerce, Classification, Big Data,
Machine Learning, Microsoft Azure Cloud, Cloud
1.1 Predicting Product Category in E-Commerce Data:
For doing business in this communication era, web is the
best medium. E-commerce has allowed businesses to
offer choices to consumers. .To address this data and
information explosion, e-commerce stores are applying
machine learning to customization principles to their
presentation in the on-line store [1]. Machine learning
has empowered businesses to analyze all queries,
whether searched or abandoned from all the
users..Machine learning can be defined as an intelligent
way to find secret patterns or information even in large
datasets or databases. Machine learning often included
in the category of predictive analytics as it helps to
predict the future analysis.
1.2 Microsoft Azure Cloud Computing Environment for
Machine learning [3]: Microsoft’s Azure Machine
Learning (Azure ML) [4] is a cloud service that enables
execution of machine learning process. Microsoft Azure
is a public cloud platform. The benefits of using public
cloud computing platform (Azure ML) includes: handling
big data and access from anywhere in the world. The
process of Azure ML is shown in Figure – 1, which is
same as that of basic process of ML. Azure ML provides
a graphical tool for managing the ML process, a set of
data pre-processing modules, a set of machine learning
algorithms, and an API to launch a model to
applications. ML Studio is a graphical tool that is used to
control the process from beginning to end i.e. from data
pre-processing to run experiments using a machine
learning algorithm, and test the resulting model. ML
Studio also helps its users deploy that model on real
Figure 1: Machine Learning Process
© 2017 Copyright IRJEAS. All Rights Reserved
International Research Journal of Engineering & Applied Sciences, IRJEAS, ISSN(O): 2322-0821, ISSN(P): 2394-9910, Volume 5 Issue 2, Apr 2017- Jun 2017, Page 11-13
the need of cloud platforms to analyze E-commerce data
4. Simulation Environment Setup and Result
is established in next section. The rest of the paper
organized as: Section 2 briefly surveys the need of cloud
Azure ML provides ML studio, a graphical tool that can
platforms to analyze ever growing E-commerce data.
be used to control the process from beginning to end. It
The work proposed is presented in section
includes: a set of data pre-processing modules; a set of
3.Experimental setup and result analysis is shown in
machine learning algorithms; An Azure ML API to access
section 4 and paper is concluded in section 5.
model deployed on Azure. ML Studio allows a user to
import datasets and data pre-processing methods.
2. Literature Review (Need of Cloud Platforms to
4.1 Dataset Description: The dataset is provided by the
Analyze E-commerce Data)
Otto Group [8] which is a large e-commerce company.
Multiple choices of cloud computing models are
This dataset includes 61,878 instances Each product
available for different work load management,
having 1 or more features out of 93 features provided
performance and computational requirements. The
for datasets.
popular statistical tools and environments like Octave, R
4.2 Execution of Implemented Work (Experiment
and Python are now embedded in the cloud as well [5].
Steps): The experimental steps that are and represented
A. Fast Analysis: The important findings of work [6]
in Figure–2, are explained below:
indicate the area of customer retention received most
1. Create New Resource: Machine Learning
research attention.
Analytics solution.
B. Machine Learning on Cloud environment for Fast
2. Import/Upload the dataset.
Prediction in Big Data: As the data is growing at faster
3. Pre-process the dataset. Data pre-processing can
rate and becoming “Big Data”, the computation speed
also be done using modules written in R or Python.
for prediction and other operations is inevitable. This
4. Randomly split and partition the data into 70%
paper [7] focused on the specific problem of
training and 30% testing, using the ‘Split Data’
classification of network intrusion traffic which is a Big
5. Identify categorical attributes and cast them into
categorical features using the ‘Edit Metadata’
The Proposed Framework which employs simple ML
6. Convert to Indicator Values module to convert
model with little change. The input dataset is suitably
columns that contain categorical values which can
processed and converted into a suitable format. The
more easily be used as features.
machine learning algorithms are iteratively applied in
7. Select Columns in Dataset those are relevant
the next step, and candidate model is determined.
8. Apply Ensemble Method
These ML algorithms typically apply some statistical
9. Apply Machine Learning Algorithm to Train the
analysis like regression or more complex approaches like
decision forest to the data. Here in the proposed
10. Now Score and Evaluate the Model. The
framework, the ensemble methods [12] are also applied
‘Evaluate model’ also visualize the results through
to the model for better predictive accuracy. At last the
confusion matrix .
model is deployed and tested on test data the snapshot
4.3 Experimental Results: Analysis and Discussion:
of actual model build using specified steps, at Microsoft
The experiment is evaluated on a simple multi-class
Azure ML platform, is shown in Figure – 2.
classification accuracy parameter. Accuracy is
defined as the number of correctly classified
instances divided by the total number of instances:
Number of correct Predictions
Accuracy =
Number of Instances
The results obtained using the benchmark code by
setting the neural network [9] model with 100 trees
got the accuracy of 0.9302 in [8], while the
benchmark results given by competition
administrators with 10 trees, is 0.50241. Here we
have performed experiment at cloud platform with
Multicast Neural network ML [10] method with 10
trees and an ensemble method. The evaluation
results are inferred from confusion matrix shown in
Figure – 3. A confusion matrix also known as error
matrix and is used to describe the performance of a
classifier (classification model). The overall
accuracy obtained with our simulation is 0.6859,
which is higher than the benchmark provided. The
comparison of proposed model is done with
Figure 2: Model built using Azure ML
benchmark provided by administrators and
© 2017 Copyright IRJEAS. All Rights Reserved
International Research Journal of Engineering & Applied Sciences, IRJEAS, ISSN(O): 2322-0821, ISSN(P): 2394-9910, Volume 5 Issue 2, Apr 2017- Jun 2017, Page 11-13
sources and domains. Also, the model can be modified
for applying on Hadoop MapReduce [11] platform.
[1] Pine II, B.J. and Gilmore, J.H. 1999. The Experience
Economy. Boston: Harvard Business School Press.
[2] j. Ben schafer, joseph a. Konstan, john riedl ecommerce recommendation applications”, data
mining and knowledge discovery, 5, 115–153, 2001,
kluwer academic publishers, netherlands.
[3] David Chappell, “introducing azure machine learning:
a guide for technical professionals”, Sponsored by
Microsoft Corporation, 2015 Chappell & Associates
[5] Daniel Pop, “Machine Learning and Cloud
Computing: Survey of Distributed and SaaS
[6] E.W.T. Ngai ,, Li Xiu, D.C.K. Chau, “Application of data
mining techniques in customer relationship
classification”, Expert Systems with Applications 36
(2009) 2592–2602, Elsevier
[7] Suthaharan, S., “Big data classification: Problems and
challenges in network intrusion prediction with
machine learning” Performance Evaluation Review,
41(4), 70-73, ACM 2014.
[9] Andy Liaw and Matthew Wiener, “Classification and
Regression by randomForest”, R News, ISSN 16093631, Vol. 2/3, December 2002.
[11]Apache Hadoop Website
[12] J. a. H. Friedman, Trevor and Tibshirani, Robert, The
elements of statistical learning vol.1: Springer
series in statistics Springer, Berlin, 2001.
Figure 3: Confusion Matrix with Multicast Neural
The comparison for accuracy obtained, is shown in
10 Tree
Figure 4: comparison for Accuracy
The companies doing online business wants to utilize
machine learning potential .In this paper, we proposed
an Azure ML based model for E-commerce product
categorization. The model used Multicast neural
network algorithm to train the classifier. The evaluation
results show that the proposed classifier performs
better in terms of accuracy. We have performed
experiment with 10 trees and an ensemble method. Our
experiments showed that feature bagging lead to the
better accuracy value (i.e. 0.55339) than benchmark
.The proposed research can provide potential approach
for training and testing of big data for addressing multiclass classification problems. So, further research will
evaluate the framework with different ML algorithms,
optimization parameters, ensemble methods and ecommerce databases. In future the model can be
optimized to handle imbalanced datasets from various
© 2017 Copyright IRJEAS. All Rights Reserved

Purchase answer to see full

Order your essay today and save 10% with the discount code ESSAYHSELP