COMP1801 Machine Learning Coursework Answers

Assignment Detail:-

Subject: ML Assignment
Number of Words: 3000+ Software Work

Struggling with COMP1801 Machine Learning Coursework Assignment? Get assignment answers with AI-free and plagiarism free work. Hire our assignment experts to assist you with coursework writing services online. Get the best programming assignment help services for students. We offer 24/7 live chat facility, on-time delivery, original content and A+ grade guaranteed. Avail of our services now!

You are working as a Data Scientist for a (fictitious) manufacturing company which produces metal parts for various industries. They have been experimenting with a new metal alloy that should have superior properties to the one being currently used. However, in practice it was found to be very sensitive to changes in the processing parameters during production, causing defects to form. If these defects reach a certain size or quantity, they severely impact the lifespan of the metal parts, meaning the entire part must be scrapped as defective.

As the process of measuring a parts lifespan is a slow, destructive, and expensive process, the company hopes that you can create a Machine Learning solution that allows them to estimate if a part has sufficient lifespan to be safely used based on more easily measured parameters, which the company hopes to use to refine their production process to lower the incidence of defects.

The company has collected a dataset consisting of a table of processing parameters and measurements taken from the completed parts. This provided set of data has been cleaned, so there is no need to check for incorrect, duplicate, or missing data.

Checkout Also: Get the Best Python Assignment Help Online by Professional Writers

Data

To complete this assignment, you must use the data provided, found on this URL: https://moodlecurrent.gre.ac.uk/mod/resource/view.php?id=3056514

Also provided on Moodle are more detailed descriptions of the dataset. It is recommended that you read these descriptions to aid in your understanding of the problem domain.

Report

Your overall task is to fit a selection of Machine Learning methods to this data, evaluate them, and finally compare them to provide a recommendation for which method the company should apply to the rest of their data.

This must be presented in the form of a report to your line manager. This report must be split into the seven parts below (shown with the corresponding mark weighting), with marks also being rewarded for the presentation and language in your report.

Your report should contain images, tables, and equations to help you demonstrate your work in parts 2-4. It is down to your discretion how many images and/or tables you wish to include, what they should show and how they are split across the different parts of the report.

Part 1 – Executive Summary (5 Marks)

This should be a summary of what the report contains – the problem you are solving, why it’s important, the ML methods you have used and a summary of your results and conclusions. Avoid generic statements about machine learning, A.I., data processing etc., and keep to the specifics of the coursework.

Part 2 – Data Exploration (10 Marks)

Describe how you loaded the dataset, then perform a concise exploration of the data and comment on any patterns or relationships in the data that you think may be relevant for creating a model and predicting the part lifespan. It is advisable to provide plots and visualizations to better highlight these regions of interest.

Tip: You can use the patterns/relationships shown in this section to better justify your chosen methods in later sections.

Considering this data exploration, identify the features you will use in your models and discuss why you believe they will be the best predictors of metal part longevity.

Finally, with reference to patterns/relationships found in the exploration and (if relevant) theoretical justification, provide a brief additional discussion outlining your expectations of which approach (i.e. regression or classification) and specific ML model will be most appropriate for providing an accurate solution to the company’s requirements.

Tip: This will also be useful in the Conclusions section (Part 5), where your initial discussion here can be compared with your post-experimentation results.

Part 3 – Regression Implementation (30 Marks)

3.1 Methodology (10 Marks)

The task for this section is to create a regression model methodology to predict the lifetime of a metal part using machine learning methods.

Firstly, choose two model types appropriate for this task, for example Linear Regression and Artificial Neural Network. Both models chosen should be supported with a brief justification explaining why it is an appropriate choice for the problem.

Tip: The models chosen do not have to be one of the models taught in the lectures (though there is no penalty for only using taught models).

Tip: The justification for choosing a model could include (but is not limited to): Patterns/relationships found in Part 2, reference to other sources, some basic experimentation, logical reasoning.

Secondly, describe and implement an appropriate pre-processing routine to be used in all following experimentations and evaluations of this section only (Part 3). This can include, but is not limited to: categorical feature encoding, feature scaling and splitting the data into training and test sets, data balancing, etc.

Thirdly, provide and justify the hyper-parameter tuning framework you will use to obtain your final models in the experiments in section 3.2. Describe which hyper- parameters you will tune for each model type chosen, how they work and why they are essential to the optimization of that architecture.

3.2 Evaluation (15 Marks)

For this sub-section you should perform and describe the experiments done to obtain your final regression model version for each chosen architecture type, by adhering to the hyper-parameter tuning framework outlined above. These experiments should be a rigorous model optimization process including comparisons between the different versions via a table or otherwise detailed description. All choices should be justified using theory, experiments and/or references.

The final model versions of both chosen types should be described in all detail – summarizing all relevant final hyperparameters chosen.

You should evaluate these final model versions using a test portion of the dataset not used in the prior training stages, using appropriate regression performance metrics. You should explain how these chosen metrics work, why they are appropriate to the task and provide a written interpretation of how well your model is performing at the given task according to these metrics.

Finally, compare the best performing version of both model types using your chosen performance metrics. Provide your final recommendation of which model is superior to deploy for this regression task, supporting this choice with a brief discussion of your results.

Tip: Remember to use the same train/test split ratio and seed for all model versions and types for a fair comparison. Also remember to use the same performance metrics across all models for parity.

Tip: A stronger evaluation section would generally include a table showing the hyper- parameter tuning progression to arrive at a best model version for both chosen model types (at least two such tables), followed by a direct comparison between the best versions of each model type (an additional table).

3.3 Critical Review (5 Marks)

For this sub-section critically review your overall methodology used for this task only (Part 3), considering the results obtained in your experiments in section 3.2. Cover areas of strengths and areas where improvement might be needed. Offer an alternative approach from the choices not utilized in your experimentations that future investigations could explore. These can include untrialled model architectures, alternate pre-processing routines, different hyper-parameter tuning schemes, etc.

Part 4 – Classification Implementation (40 Marks)

4.1 Feature Crafting (10 Marks)

Your line manager has decided that beyond the exact lifetime of a part, it is also important to know whether a part has a lifetime above 1500 hours (determined by the company as the minimum lifetime before a part is considered defective).

Before choosing any potential model architectures, first create an additional feature (a column named “1500_labels” for example) representing a binary output label for this lifespan threshold that may be used to predict whether a part is defective or not. Populate this feature with a positive binary output if the hourly threshold is met.

However, your line manager thinks that splitting the data into only two groups may be naïve, and hence predictions made by a binary classification model may not be suitable for finding the best processing parameters for manufacture. Currently, it is unclear how many groups the data should be split into and why.

Instead of using the provided threshold of 1500 to create a binary class, you can (for higher marks) perform and utilize alternate grouping methods on the records based on the lifespan and potentially relationships with the other features, while providing justifications for doing so. The type of this grouping can range from a more complex thresholding technique (such as a statistical outlier function) to the application of an unsupervised learning algorithm, such as clustering, to separate the inputs into different groups (which can result in three or more groups.) More complex and/or better-justified methods will earn higher marks for this section than simpler ones, but must be researched by the student on their own.

Tip: If using a clustering method, remember you need to choose a value for the number of clusters yourself (k in K-means for example) and justify this value with appropriate reasoning and/or experimentation. References can help here.

You should include at least one plot or table showing the results of the thresholding or grouping to give a better idea of how balanced the resulting dataset is. If using thresholding/grouping other than 1500, provide justifications as to why it is superior.

4.2 Methodology (10 Marks)

The task for this section is to create a classification model methodology to predict the output class of a metal part using machine learning methods.

Firstly, once your output labels are created, choose two model types appropriate for this task, for example Logistic Regression and Artificial Neural Network. Both models chosen should be supported with a brief justification explaining why it is an appropriate choice for the problem.

Tip: The models chosen do not have to be one of the models taught in the lectures (though there is no penalty for only using taught models).

Secondly, describe and implement an appropriate pre-processing routine to be used in all following experimentations and evaluations of this section only (Part 4). This can include, but is not limited to: categorical feature encoding, feature scaling and splitting the data into training and test sets, data balancing, etc.

Tip: Remember to exclude the lifespan feature from both train and test set inputs for this task, since we are trying to predict its threshold/groups from the other variables using our newly created output labels (which is trivial if we keep lifespan as a feature).

Thirdly, provide and justify the hyper-parameter tuning framework you will use to obtain your final models in the experiments in section 4.3. Describe which hyper- parameters you will tune for each model type chosen, how they work and why they are essential to the optimization of that architecture.

Reminder: All choices of models must be capable of and configured to perform the same task (e.g. binary or multi-class) to ensure parity. High grades are attainable using only binary classification. Implementing a multi-class classification model is for those seeking a challenge for potentially top marks. Hence, if you are constrained for time, or are otherwise having difficulty with this section, it is recommended to use the 1500 thresholding rule for a binary classification problem and focus on providing the best discussion and evaluation possible.

4.3 Evaluation (15 Marks)

For this sub-section you should perform and describe the experiments done to obtain your final classification model version for each chosen model type, by adhering to the hyper-parameter tuning framework outlined above. These experiments should be a rigorous model optimization process including comparisons between the different versions via a table or an otherwise detailed description. All choices should be justified using theory, experiments and/or references.

The final model versions of both chosen types should be described in all detail – summarizing all relevant final hyperparameters chosen.

You should evaluate your final model versions using a test portion of the dataset not used in the prior training stages, using appropriate classification performance metrics. You should explain how these chosen metrics work, why they are appropriate to the task and provide a written interpretation of how well your model is performing at the given task according to these metrics.

Finally, compare the best performing version of both model types using your chosen performance metrics. Provide your final recommendation of which model is superior to deploy for this classification task, supporting this choice with a brief discussion of your results.

4.4 Critical Review (5 Marks)

For this sub-section critically review your overall methodology used for this task only (Part 4) considering the results obtained in your experiments. Cover areas of strengths and areas where improvement might be needed. Offer an alternative approach for future investigations to explore from the choices not utilized in your experimentations. These can include untrialled model architectures, alternate pre- processing routines, different hyper-parameter tuning schemes, etc.

Part 5 – Conclusions (5 Marks)

Summarize your findings in-light of your experiments’ results and compare them to your initial assessment (from Data Exploration section, Part 2). Attempt to provide a final recommendation to your manager for which single model version should be deployed between the Regression (Part 3) or Binary Classification (Part 4) implementations for the task of helping the company predict if a part is usable or not. Provide a reasoned justification for this recommendation considering aspects such as (but not limited to) model accuracy, feature crafting and the outside business context of the task. If you are not comfortable recommending either model be deployed, explain why.

Part 6 – Report Presentation, Language and References (10 Marks)

The report should be presented in a professional manner with a neat and clear layout and all writing in proper English using good grammar.

The report must be written using one of the following templates (make your own copy to work on):

Word – https://moodlecurrent.gre.ac.uk/mod/resource/view.php?id=3056521
Latex – https://moodlecurrent.gre.ac.uk/mod/url/view.php?id=3056520 To obtain full marks your report should adhere to the following:
Follow the template page layout (in either latex or word):
- No changes to font type, size or
- No changing the page layout (e.g. margin, orientation, size)
- Text should be split into sections with titles and proper paragraphs
All images and tables should have a label and will be ignored unless specifically referred to using this label in the body
Images should be properly generated, labelled, and rendered at an appropriate scale and resolution.
- Images are expected to primarily be graphs, plots and other visualizations that can be used to aid in describing and understanding your data and the model selection and evaluation processes
- If you wish to include images of the loaded data, this should only be an indicative sample showing up to 10
- You should not include screenshots of your code in the
Total word count should not go over 3000 There is no minimum word count, but it is recommended the report should have at least 2000 words.
The individual tasks do not have word counts assigned. It is down to your discretion how much space to devote to a section given the brief and the number of marks provided.
Correct spelling, punctuation and grammar should be used in all
Marks may also be removed at markers discretion for other issues with presentation not listed
You need to include a reference list that should include any external references used such as academic papers, books, blogs, and code While explicit marks are not awarded for the reference list, to achieve good marks you would be expected to use multiple references throughout the report in each section to support your decision making and interpretations of the results. In text citations should be used and all references should use a consistent referencing style. Note there should not be any references in the executive summary.

Python Code

You should implement a solution to the above task within a single Python Notebook using Google Colab. Any data in the text/tables or images used in your report should be generated using this Python Notebook.

This source code must also be provided as both a downloaded notebook (.ipynb) and exported as a PDF from Google Colab.

Markers should be able to run your provided code without any errors requiring debugging. If they are unable to reproduce any results in your report, you may be penalized for the inconsistency.

However, the markers will not be marking your source code and will only look at it in detail if they feel they need to test or otherwise check something you present in the report. Your marks are based on the report and not the code, so ensure that everything required to complete the tasks given are shown in the report. Even if your code provides a good solution, anything not shown/explained in the report will not be marked.

Assessment Criteria

Marks Breakdown

The mark breakdown is given below. Other than the report presentation, the marks given for each section would usually be somewhat proportional to the amount of space you should devote to a section in the report. Generally, both the quality of the solution implemented and the explanation of your reasoning and interpretation in the report will be considered.

Executive Summary (5 Marks)
Data Exploration (10 Marks)
Regression Implementation (30 Marks)
- Methodology (10 Marks)
- Evaluation (15 Marks)
- Critical Review (5 Marks)
Classification Implementation (40 Marks)
- Feature Crafting (10 Marks)
- Methodology (10 Marks)
- Evaluation (15 Marks)
- Critical Review (5 Marks)
Conclusions and Recommendations (5 Marks)
Presentation and language (10 Marks)

Rubric

Some general additional guidelines you should consider on how marks are awarded:

Model accuracy: The final accuracy/loss of your ML models do not have a direct impact on your marks. While you are expected to experiment and obtain the best accuracy that you can, it is recognised that this is not always an easy Rather, the mark will mostly depend on your overall approach to the problem, the soundness of your reasoning, and how you evaluate and analyse your results.
Model complexity: While we are not explicitly marking you higher for implementing more complicated models in your solutions, higher scores will often correlate with more complex implementations as there is generally more scope to experiment with and discuss their hyperparameter tuning. For example, a simple multivariable linear regression model may provide a valid solution with reasonable accuracy, but there are no hyperparameters to tune in this case and hence nothing to experiment with for improving the model This would inevitably lead to some marks not being awarded in the methodology and evaluation section for this model.
Explanation detail: Demonstration of a critical understanding of the relevant concepts, appropriate explanation of all your steps taken and recommendations/discussions that are supported by your
Report quality: Are all sections included and properly completed? Does it read well? Does each section have a logical structure? Are design decisions appropriately discussed? Is your evaluation realistic and thorough?

Get Answers