Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model

EBRAHIM, MESHARI

doi:10.23889/SUthesis.59728

E-Thesis 918 views 483 downloads

Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model / MESHARI EBRAHIM

Swansea University Author: MESHARI EBRAHIM

PDF | E-Thesis – open access

Copyright: The author, Meshari A. Al-Ebrahim, 2020.
Download (11.91MB)

DOI (Published version): 10.23889/SUthesis.59728

Abstract

A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosi...

Full description

Published:	Swansea 2020
Institution:	Swansea University
Degree level:	Doctoral
Degree name:	Ph.D
Supervisor:	Ransing, Rajesh
URI:	https://cronfa.swan.ac.uk/Record/cronfa59728

first_indexed	2022-03-29T08:26:04Z
last_indexed	2022-03-30T03:27:45Z
id	cronfa59728
recordtype	RisThesis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2022-03-29T10:44:55.3737662</datestamp><bib-version>v2</bib-version><id>59728</id><entry>2022-03-29</entry><title>Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model</title><swanseaauthors><author><sid>fd17f92abca231a4dd2fc9f4cc3a0fa6</sid><firstname>MESHARI</firstname><surname>EBRAHIM</surname><name>MESHARI EBRAHIM</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-03-29</date><abstract>A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network</keywords><publishedDay>29</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2020</publishedYear><publishedDate>2020-09-29</publishedDate><doi>10.23889/SUthesis.59728</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Ransing, Rajesh</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><apcterm/><lastEdited>2022-03-29T10:44:55.3737662</lastEdited><Created>2022-03-29T09:23:54.7798165</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>MESHARI</firstname><surname>EBRAHIM</surname><order>1</order></author></authors><documents><document><filename>59728__23708__0ee6fcedd066475d97bf6b30c5fd26ae.pdf</filename><originalFilename>Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf</originalFilename><uploaded>2022-03-29T09:29:41.3174630</uploaded><type>Output</type><contentLength>12491517</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Meshari A. Al-Ebrahim, 2020.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling	2022-03-29T10:44:55.3737662 v2 59728 2022-03-29 Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model fd17f92abca231a4dd2fc9f4cc3a0fa6 MESHARI EBRAHIM MESHARI EBRAHIM true false 2022-03-29 A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets. E-Thesis Swansea Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network 29 9 2020 2020-09-29 10.23889/SUthesis.59728 COLLEGE NANME COLLEGE CODE Swansea University Ransing, Rajesh Doctoral Ph.D 2022-03-29T10:44:55.3737662 2022-03-29T09:23:54.7798165 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised MESHARI EBRAHIM 1 59728__23708__0ee6fcedd066475d97bf6b30c5fd26ae.pdf Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf 2022-03-29T09:29:41.3174630 Output 12491517 application/pdf E-Thesis – open access true Copyright: The author, Meshari A. Al-Ebrahim, 2020. true eng
title	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
spellingShingle	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model MESHARI EBRAHIM
title_short	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_full	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_fullStr	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_full_unstemmed	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_sort	Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
author_id_str_mv	fd17f92abca231a4dd2fc9f4cc3a0fa6
author_id_fullname_str_mv	fd17f92abca231a4dd2fc9f4cc3a0fa6_***_MESHARI EBRAHIM
author	MESHARI EBRAHIM
author2	MESHARI EBRAHIM
format	E-Thesis
publishDate	2020
institution	Swansea University
doi_str_mv	10.23889/SUthesis.59728
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised
document_store_str	1
active_str	0
description	A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets.
published_date	2020-09-29T05:01:39Z
_version_	1858706341791531008
score	11.453587

Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model / MESHARI EBRAHIM

Similar Items