Acta Oeconomica Pragensia 2019, 27(1):3-20 | DOI: 10.18267/j.aop.613

Early Defect Detection Using Clustering Algorithms

Blanka Bártová, Vladislav Bína
University of Economics, Prague, Faculty of Management (blanka.bartova@vse.cz; vladislav.bina@vse.cz).

Product quality is a crucial issue for manufacturing companies, so it is essential to take note of any emerging product defects. In contrast to the use of traditional methods, the "modern" constantly evolving data mining methods are now being more frequently used. The main objective of this paper is to detect the potential cause or the area of the production process where the majority of product defects arise. The dataset from the semiconductor manufacturing process has been used for this purpose. First, it was necessary to address dataset quality. Significant multicollinearity was found in the data and to detect and delete the collinear variables, correlations and variance inflation factors have been used. The MICE-CART method has been used for the imputation because the original dataset contained more than 5% of random missing values. In further analysis, the K-means clustering method has been used to separate the failed products from the flawless ones. Following this, the hierarchical clustering method has been used for the failed product to create groups of product defects with similar properties. For the optimal number of clusters, the determination of the BIC method has been used. Five clusters of products have been made although only three can be classed as important for further analysis. These groups of products should be directly subjected to the analysis in the production process, which can assist in identifying the source of scarcity.

Keywords: manufacturing, data mining, clustering, product quality, quality management, MICE-CART, VIF
JEL classification: C38, C44, D24, L15

Published: May 1, 2019  Show citation

ACS AIP APA ASA Harvard Chicago IEEE ISO690 MLA NLM Turabian Vancouver
Bártová, B., & Bína, V. (2019). Early Defect Detection Using Clustering Algorithms. Acta Oeconomica Pragensia27(1), 3-20. doi: 10.18267/j.aop.613
Download citation

References

  1. AS/NZS ISO (1994). ISO 9001:1994 Quality Systems - Model for Quality Assurance in Design, Development, Production, Installation and Servicing, [online]. Available at: https://www.saiglobal.com/pdftemp/previews/osh/as/as9000/9000/9001.pdf [Accessed 25 Jun. 2018]
  2. Berkhin, P. (2006). A Survey of Clustering Data Mining Techniques. In J. Kogan, C. Nicholas and M. Teboulle, eds., Grouping Multidimensional Data (pp. 25-71). Berlin: Springer. Go to original source...
  3. Bhuvaneswari, S., and Sabarathinam, J. (2013). Defect Analysis Using Artificial Neural Network. International Journal of Intelligent Systems and Applications, 5(5), pp. 33-38. https://doi.org/10.5815/ijisa.2013.05.05 Go to original source...
  4. Bora, D. J., and Gupta, A. K. (2014). Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab. International Journal of Computer Science and Information Technologies, 5(2), pp. 2501-2506.
  5. Castro, R. M., Coates, M. J., and Nowak, R. D. (2004). Likelihood Based Hierarchical Clustering. IEEE Transactions on Signal Processing, 52(8), pp. 2308-2321. https://doi.org/10.1109/TSP.2004.831124 Go to original source...
  6. Cohen-Addad, V. et al. (2018). Hierarchical Clustering: Objective Functions and Algorithms. SODA (Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms). https://doi.org/10.1137/1.9781611975031.26 Go to original source...
  7. Crosby, P. B. (1979). Quality Is Free: The Art of Making Quality Certain. New York: McGraw-Hill.
  8. Deming, W. E. (1982). Quality Productivity and Competitive Position. Cambridge, MA: MIT Press.
  9. Dragulanescu, I.-V., and Popescu, D. (2015). Quality and Competitiveness: A Lean Six Sigma Approach. Amfiteatru Economic Journal, 17(9), pp. 1167-1182.
  10. Edwards, J. M., and Finch, W. H. (2018). Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation, Psicológica, 39, pp. 88-117. https://doi.org/10.2478/psicolj-2018-0005 Go to original source...
  11. Elavarasi, S. A., Akilandeswari, J., and Sathiyabhama, B. (2011). A Survey on Partition Clustering Algorithms. International Journal of Enterprise Computing and Business Systems, 1(1).
  12. Evans, J. R. (2015). Modern Analytics and the Future of Quality and Performance Excellence. Quality Management Journal, 22(4), pp. 6-17. https://doi.org/10.1080/10686967.2015.11918447 Go to original source...
  13. Felsenstein, J. (2003). Inferring Phylogenies (2nd ed.). Oxford: Sinauer Associates.
  14. Gaub, H. (2016). Customization of Mass-Produced Parts by Combining Injection Molding and Additive Manufacturing with Industry 4.0 Technologies. Reinforced Plastics, 60(6), pp. 401-404. https://doi.org/10.1016/j.repl.2015.09.004 Go to original source...
  15. Grabusts, P. (2011). Distance Metrics Selection Validity in Cluster Analysis. Scientific Journal of Riga Technical University, 45(1), pp. 72-77. https://doi.org/10.2478/v10143-011-0045-y Go to original source...
  16. Horvath, M., and Vircikova, E. (2012). Data Mining Model for Quality Control of Primary Aluminum Production Process. Management and Production Engineering Review, 3(4), p. 47. https://doi.org/10.2478/v10270-012-0033-x
  17. Hoyle, D. (1994). ISO9000 - Quality Systems Handbook (2nd ed.) Birlingham: Butford Technical Publishing. Go to original source...
  18. Hubert, L. (1977). A Set-Theoretical Approach to the Problem of Hierarchical Clustering. Journal of Mathematical Psychology, 15(1), pp. 70-88. https://doi.org/10.1016/0022-2496(77)90041-4 Go to original source...
  19. Chammas, A. et al. (2015). Drift Detection and Characterization for Condition Monitoring: Application to Dynamical Systems with Unknown Failure Modes. IMA Journal of Management Mathematics, 26, pp. 225-243. https://doi.org/10.1093/imaman/dpu008. Go to original source...
  20. Chitra, A., and Maheswari, D. (2017). A Comparative Study of Various Clustering Algorithms in Data Mining. International Journal of Computer Science and Mobile Computing, 6(8), pp. 109-115.
  21. Choudhary, A. K., Tiwari, M. K., and Harding, J. A. (2009). Data Mining in Manufacturing: A Review Based on the Kind of Knowledge. Journal of Intelligent Manufacturing, 20(5), pp. 501-521. https://doi.org/10.1007/s10845-008-0145-x Go to original source...
  22. Kaiser, J. (2014). Dealing with Missing Values in Data. Journal of Systems Integration, 5(1), pp. 42-51. https://doi.org/10.20470/jsi.v5i1.178 Go to original source...
  23. Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and Algorithms. Hoboken, NJ: Wiley-Interscience.
  24. Kao, H.-A. et al. (2017). Quality Prediction Modeling for Multistage Manufacturing Based on Classification and Association Rule Mining. MATEC Web of Conferences, 123. https://doi.org/10.1051/matecconf/201712300029 Go to original source...
  25. Kerdprasop, K., and Kerdprasop, N. (2011). Feature Selection and Boosting Techniques to Improve Fault Detection Accuracy in the Semiconductor Manufacturing Process, [online]. IMECS. Available at: http://www.iaeng.org/publication/IMECS2011/IMECS2011_pp398-403.pdf [Accessed 25 Jul. 2018]
  26. Knowles, G. (2011). Quality Management, [online]. Available at: http://www.znrfak.ni.ac.rs/SERBIAN/010-STUDIJE/OAS-3-2/PREDMETI/III%20GODINA/316-KOMUNALNI%20SISTEMI%20I%20ZIVOTNA%20SREDINA/SEMINARSKI%20RADOVI/2014/S175%20-%20S200.pdf [Accessed 25 Jun. 2018]
  27. Köksal, G., Batmaz, I., and Testik, M. C. (2011). A Review of Data Mining Applications for Quality Improvement in Manufacturing Industry. Expert Systems with Applications, 38. https://doi.org/10.1016/j.eswa.2011.04.063 Go to original source...
  28. Kutner, M. H. et al. (2005). Applied Linear Statistical Models (5th ed.). New York: McGraw-Hill.
  29. Liberty, E., Sriharsha, R, and Sviridenko, M. (2015). An Algorithm for Online K-Means Clustering. 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 81-89. https://doi.org/10.1137/1.9781611974317.7 Go to original source...
  30. Mabkhot, M. M. et al. (2018). Requirements of the Smart Factory System: A Survey and Perspective. Machines, 6(2), p. 23. https://doi.org/10.3390/machines6020023. Go to original source...
  31. McCann, M., and Johnston, A. (2008). SECOM Data Set, Index of /ml/machine-learning-databases/secom. Available at: https://archive.ics.uci.edu/ml/machine-learning-databases/secom/secom.names [Accessed 13 Apr. 2018]
  32. Mehran, E., and Mehran, S. (2013). Quality Management and Performance: An Annotated Review. International Journal of Production Research, 51(18), pp. 5625-5643. https://doi.org/10.1080/00207543.2013.793426 Go to original source...
  33. Moorthy, K., Mohamad, M. S., and Deris, S. (2014). A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data. Current Bioinformatics, 9(1), pp. 18-22. https://doi.org/10.2174/1574893608999140109120957 Go to original source...
  34. Nazeer, K. A. A., and Sebastian, M. P. (2009). Improving the Accuracy and Efficiency of the K-Means Clustering Algorithm. [online]. Proceedings of the World Congress on Engineering, 1. Available at: www.iaeng.org/publication/WCE2009/WCE2009_pp308-312.pdf [Accessed 12 Apr. 2018]
  35. Paul, R. K. (2006). Multicollinearity: Causes, Effects and Remedies, [online]. Available at: https://www.researchgate.net/publication/255640558_MULTICOLLINEARITY_CAUSES_EFFECTS_AND_REMEDIES [Accessed 25 Apr. 2018]
  36. Peterson, A. D., Ghosh, A. P., and Maitra, R. (2018). Merging K-Means with Hierarchical Clustering for Identifying General-Shaped Groups. Stat (The ISI's Journal for the Rapid Dissemination of Statistics Research), 7(1). https://doi.org/10.1002/sta4.172 Go to original source...
  37. Popat, S. K. et al. (2014). Review and Comparative Study of Clustering Techniques. International Journal of Computer Science and Information Technologies, 5(1), pp. 805-812.
  38. Ramana, E. V., and Reddy, P. R. (2013). Data Mining Based Knowledge Discovery for Quality Prediction and Control of Extrusion Blow Molding Process. International Journal of Advances in Engineering & Technology, 6(2), pp. 703-713.
  39. Rani, Y., and Rohil, H. (2013). A Study of Hierarchical Clustering Algorithm. International Journal of Information and Computation Technology, 3(11), pp. 1225-1232.
  40. Rokach, L., Romano, R., and Maimon, O. (2008). Mining Manufacturing Databases to Discover the Effect of Operation Sequence on the Product Quality. Journal of Intelligent Manufacturing, 19(3), pp. 313-325. https://doi.org/10.1007/s10845-008-0084-6 Go to original source...
  41. Saad, N. H. et al. (2015). Defect Segmentation of Semiconductor Wafer Image Using K-Means Clustering. Applied Mechanics and Materials, 815, pp. 374-379. https://doi.org/10.4028/www.scientific.net/AMM.815.374 Go to original source...
  42. Sabet, S. A. A. M., Moniri, A., and Mohebbi, F. (2017). Root-Cause and Defect Analysis Based on a Fuzzy Data Mining Algorithm. International Journal of Advanced Computer Science and Applications, 8(9), pp. 21-28. https://doi.org/10.14569/IJACSA.2017.080903 Go to original source...
  43. Sadikoglu, E., and Zehir, C. (2010). Investigating the Effects of Innovation and Employee Performance on the Relationship between Total Quality Management Practices and Firm Performance: An Empirical Study of Turkish Firms. International Journal of Production Economics, 127(1), pp. 13-26. https://doi.org/10.1016/j.ijpe.2010.02.013 Go to original source...
  44. Saludes-Rodil, S., Baeyens, E., and Rodríguez-Juan, C. P. (2015). Unsupervised Classification of Surface Defects in Wire Rod Production Obtained by Eddy Current Sensors. Sensors, 15(5), pp. 10100-10117. https://doi.org/10.3390/s150510100 Go to original source...
  45. Schumacher, A., Erol, S., and Sihn, W. (2016). A Maturity Model for Assessing Industry 4.0 Readiness and Maturity of Manufacturing Enterprises. Procedia CIRP, 52, pp. 161-166. https://doi.org/10.1016/j.procir.2016.07.040 Go to original source...
  46. Scrucca, L. et al. (2016). mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. The R Journal, 8(1), pp. 289-317. Go to original source...
  47. Tan, P.-N. et al. (2018). Cluster Analysis: Basic Concepts and Algorithms. In Introduction to Data Mining, 2nd ed. (pp. 525-611). London: Pearson.
  48. Tseng, T.-L., Jothishankar, M. C., and Wu, T. (2004). Quality Control Problem in Printed Circuit Board Manufacturing - An Extended Rough Set Theory Approach. Journal of Manufacturing Systems, 23(1), pp. 56-72. https://doi.org/10.1016/S0278-6125(04)80007-4 Go to original source...
  49. Vijayalakshmi, M., and Devi, R. (2012). A Survey of Different Issue of Different Clustering Algorithms Used in Large Data Sets. International Journal of Advanced Research in Computer Science and Software Engineering, 2(3), pp. 305-307.
  50. Wagstaff, K., and Cardie, C. (2001). Constrained K-Means Clustering with Background Knowledge, [online]. Proceedings of the Eighteenth International Conference on Machine Learning. Available at: https://pdfs.semanticscholar.org/0bac/ca0993a3f51649a6bb8dbb093fc8d8481ad4.pdf [Accessed 14 Apr. 2018]
  51. Wang, K. (2006). Data Mining in Manufacturing: The Nature and Implications. In K. Wang, G. L. Kovacs, M. Wozny and M. Fang, eds., Knowledge Enterprise: Intelligent Strategies in Product Design, Manufacturing, and Management (pp. 1-10). Proceedings of PROLAMAT 2006, IFIP TC5 Conference, June 15-17, 2006, Shanghai, China. Boston, MA: Springer. https://doi.org/10.1007/0-387-34403-9_1 Go to original source...
  52. Wang, K.-S. (2013). Towards Zero-Defect Manufacturing (ZDM) - A Data Mining Approach. Advances in Manufacturing, 1(1), pp. 62-74. https://doi.org/10.1007/s40436-013-0010-9 Go to original source...
  53. Wulff, J., and Ejlskov, L. (2017). Multiple Imputation by Chained Equations in Praxis: Guidelines and Review. The Electronic Journal of Business Research Methods, 15(1), pp. 41-56.
  54. Yiakopoulos, C. T., Gryllias, K. C. and Antoniadis, I. A. (2011). Rolling Element Bearing Fault Detection in Industrial Environments Based on a K-Means Clustering Approach. Expert Systems with Applications, 38(3), pp. 2888-2911. https://doi.org/10.1016/j.eswa.2010.08.083 Go to original source...
  55. Yin, T. S. et al. (2018). Comparing Quality Management Practices between Food Industry and Electrical and Electronic Industry. In V. Ribiere, ed., Proceedings of the International Conference on Management, Leadership & Governance, Bangkok, 24-25 May (pp. 326-332). Reading, UK: Acad. Conf. and Publish. International Limited.
  56. Yusof, M. et al. (2017). Clustering of Frequency Spectrums from Different Bearing Fault Using Principle Component Analysis. MATEC Web of Conferences, 90. https://doi.org/10.1051/matecconf/20179001006 Go to original source...
  57. Zerhari, B., Lahcen, A. A., and Mouline, S. (2015). Big Data Clustering: Algorithms and Challenges [conference paper, online]. International Conference on Big Data, Cloud and Applications BDCA'15, Tetuan, Morocco. Available at: https://www.researchgate.net/publication/276934256_Big_Data_Clustering_Algorithms_and_Challenges [Accessed 15 Apr. 2018]

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.