No Cover Image

E-Thesis 363 views 202 downloads

Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model / MESHARI EBRAHIM

Swansea University Author: MESHARI EBRAHIM

  • Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf

    PDF | E-Thesis – open access

    Copyright: The author, Meshari A. Al-Ebrahim, 2020.

    Download (11.91MB)

DOI (Published version): 10.23889/SUthesis.59728

Abstract

A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosi...

Full description

Published: Swansea 2020
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Ransing, Rajesh
URI: https://cronfa.swan.ac.uk/Record/cronfa59728
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2022-03-29T08:26:04Z
last_indexed 2022-03-30T03:27:45Z
id cronfa59728
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2022-03-29T10:44:55.3737662</datestamp><bib-version>v2</bib-version><id>59728</id><entry>2022-03-29</entry><title>Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model</title><swanseaauthors><author><sid>fd17f92abca231a4dd2fc9f4cc3a0fa6</sid><firstname>MESHARI</firstname><surname>EBRAHIM</surname><name>MESHARI EBRAHIM</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-03-29</date><abstract>A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e&#x2D9;ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e&#x2D9;ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network</keywords><publishedDay>29</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2020</publishedYear><publishedDate>2020-09-29</publishedDate><doi>10.23889/SUthesis.59728</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Ransing, Rajesh</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><apcterm/><lastEdited>2022-03-29T10:44:55.3737662</lastEdited><Created>2022-03-29T09:23:54.7798165</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>MESHARI</firstname><surname>EBRAHIM</surname><order>1</order></author></authors><documents><document><filename>59728__23708__0ee6fcedd066475d97bf6b30c5fd26ae.pdf</filename><originalFilename>Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf</originalFilename><uploaded>2022-03-29T09:29:41.3174630</uploaded><type>Output</type><contentLength>12491517</contentLength><contentType>application/pdf</contentType><version>E-Thesis &#x2013; open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Meshari A. Al-Ebrahim, 2020.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2022-03-29T10:44:55.3737662 v2 59728 2022-03-29 Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model fd17f92abca231a4dd2fc9f4cc3a0fa6 MESHARI EBRAHIM MESHARI EBRAHIM true false 2022-03-29 A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets. E-Thesis Swansea Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network 29 9 2020 2020-09-29 10.23889/SUthesis.59728 COLLEGE NANME COLLEGE CODE Swansea University Ransing, Rajesh Doctoral Ph.D 2022-03-29T10:44:55.3737662 2022-03-29T09:23:54.7798165 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised MESHARI EBRAHIM 1 59728__23708__0ee6fcedd066475d97bf6b30c5fd26ae.pdf Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf 2022-03-29T09:29:41.3174630 Output 12491517 application/pdf E-Thesis – open access true Copyright: The author, Meshari A. Al-Ebrahim, 2020. true eng
title Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
spellingShingle Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
MESHARI EBRAHIM
title_short Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_full Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_fullStr Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_full_unstemmed Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
title_sort Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model
author_id_str_mv fd17f92abca231a4dd2fc9f4cc3a0fa6
author_id_fullname_str_mv fd17f92abca231a4dd2fc9f4cc3a0fa6_***_MESHARI EBRAHIM
author MESHARI EBRAHIM
author2 MESHARI EBRAHIM
format E-Thesis
publishDate 2020
institution Swansea University
doi_str_mv 10.23889/SUthesis.59728
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised
document_store_str 1
active_str 0
description A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets.
published_date 2020-09-29T04:17:15Z
_version_ 1763754152397111296
score 11.013082