# Effects of Anticipation in Manufacturing Processes: Towards Visual Search Modeling in Human Factors

In the model of human information processing, it is common to represent the cycle from perception to response. In this study, we focused on what happens in the intervals between the cycle of the work processes in the manufacturing industry, which has not been paid much attention to, and visualized the state of visual search in cyclic processes using eye tracking. As a result, it was found that an anticipation is performed as a preparation for making a decision for action in the next process, and thus contributes to the speed from perception to response. Based on this result, we will discuss the modeling of visual search in human factors.

CCS Concepts: Empirical studies in interaction design

ACM Reference Format:
Jun Nakamura, Sanetake Nagayoshi and Nozomi Komiya. 2021. Effects of Anticipation in Manufacturing Processes: Towards Visual Search Modeling in Human Factors. In The 8th Multidisciplinary International Social Networks Conference (MISNC2021), November 15-17, 2021, Bergen, Norway. ACM, New York, NY, USA, 10 Pages. https://doi.org/10.1145/3504006.3504009

## 1 INTRODUCTION

There has been a lot of discussion about digital transformation in the last few years [21]. This can be attributed to the improvement of sensor technology and the expansion of network traffic. In particular, the movement of robotics in the field of manufacturing is increasing, particularly using Artificial Intelligence [22]. However, many small and medium-sized enterprises (SMEs) in the manufacturing industry still use processes that require human intervention. The authors have been conducting research on human factors to contribute to the improvement of productivity at the manufacturing sites of these SMEs. According to [18], the model of Human Information Processing (HIP) is shown in Fig. 1.

As shown in Fig. 1, in the model of HIP, after an object is sensed at first, a decision is made based on its Perception and Memory, and a response target is then selected, and finally an Execution is made. This paper focuses on the attention resources, which influence multiple stages of HIP. The scope of this paper is one of these processes in manufacturing, especially the "behind-the-scenes" time between the completion of one process and the start of the next. However, since the sequence of information processing may start anywhere [18], the attention resource must have some effect even in between such a cyclic work process. The author's belief is that the attention resources must have some effect even between cyclic work processes. In this paper, we focus on the gaze of workers and explore the aspect of attention resources in the intervals between tasks by comparing skilled workers and apprentices.

The authors have so far focused on heat maps for their analysis [14] though, in this paper we will focus on the order of gazing and examine whether it is possible to model the gazing object with the aspect of anticipation.

## 2 PREVIOUS RESEARCH

Human factor is a multidisciplinary science and technology that has come to relate decision-making and cognitive processes to human behavior. The scope of research has covered a wide range in recent decades, and there are many studies that border on ergonomics, engineering psychology, cognitive engineering, and so forth [18].

The SHEL model is an acronym for Software, Hardware, Environment, and Liveware [6], where Software and Hardware go without saying, Environment refers to noise, brightness, temperature, humidity, etc., and Liveware has two meanings, referring to forms and organizations that belong to people, such as teamwork, and people as users of systems. In the SHEL model, the boundary area between software and humans is discussed with a focus on Liveware as users of systems, for an example. Using the relationship between the cockpit and the pilot in the field of aviation, the human factors were extracted and analyzed to find out what kind of animal humans are [8]. In this paper, we focus on Liveware as the user of the system, Hardware as the machine in the factory, and Software as the non-physical aspect of the SHEL model.

It is easy to imagine that the perception and cognition of humans (Liveware) as users of the system affects work efficiency at the center of the SHEL model. In particular, behind the cyclic process described at the beginning of this paper, it is said that image training to recognize oneself in the next task even has this effect [2]. At that time, the so-called metacognition, which is the recognition of oneself doing one's own work, is a central theme in this paper.

Suwa [16] states that it is important to utilize external observational measurement techniques to more efficiently accelerate the proficiency cycle through this metacognition and metacognitive verbalization. In addition, metacognition has the role of downgrading the perception of overconfidence and modulating the choices that one should make (knowledge in the head) and the choices to seek additional information (knowledge in the world). In particular, the latter, anticipate effort to seek additional information in selective attention, is said to be a form of metacognition [7, 19].

This idea of Knowledge in the head and Knowledge in the world can be divided into the Top-down approach with brain that involves perception and understanding, while the Bottom-up approach with lower level of stimulus processing. This concept is discussed in the Visual Search Process [15]. Rather than seeking information based on knowledge in the world, this paper explores the aspect of knowledge in the head, where anticipate efforts between working processes might contribute for the next working process.

Now, let's talk about the research on gaze. There is a long history of research on gaze, including a study that found cognitive patterns in images while observing various scenes and objects in nature [23]. Holmqvist, et al. [11] describes in detail the various methods of gaze measurement, including the features of each software and methods of data analysis. There are three types of gaze states: Fixation, in which the gaze is maintained on the object of interest; Saccade, in which the gaze moves rapidly during a short period of time; and Pursuit Movement, in which the gaze attempts to track a dynamic target.

The state of fixation is a state in which the orientation is maintained, and if the time is short, it is only a sensual immediacy of perception. If the fixation time is long, it is considered to be a state of exposed aesthetic exploration to determine the correct value [3]. Fixation is further classified into three types [13]: Tremors (fine reciprocal eye movements that are difficult to observe), Drifts (original fixations that occur at the same time as the Tremor or while gazing at the object during the Microsaccade), and Microsaccades (short, fast eye movements that correct the Drift [4]. In addition, Nystagmus is an unconscious, regular, pendulum-like back-and-forth movement of the gaze, such as when looking at the scenery outside from a train [5].

In the application area, there are many papers such as a survey paper related to advertising [10], a paper evaluating usability practices for web pages [1], a study comparing the gaze from a car at the same intersection day and night [9], and so on. These are studies on still images. As for research on eye tracking for moving images, for example, the Melbourne-based Eye Tracking the Moving Image Research Group has published a research paper on eye tracking for moving films.

In this paper, gaze measurements of skilled worker and new workers in the manufacturing process of one of the piping parts used in automobiles were conducted and their differences were analyzed. Such a method of analyzing the differences between skilled worker and new workers is often used in the field of Human Factors and Ergonomics in manufacturing [12].

## 3 METHOD

This study was conducted with the cooperation of a manufacturer of piping for automobiles and buildings, and experiments were made at the site of manufacturing operation of the company. The procedure of the experiment is as follows:

1. Record the work process.
2. Play back the recording and measure the gaze of the skilled worker and the newcomer in the recording.
3. Analyze the measurement results.

As for the definition of skills, President Tano of the company explained that it cannot be simply measured by year of experience alone, and since there are good and bad intuition, the skill evaluation is judged based on the difference in manufacturing quantity per hour. In general, there is a difference of 50 pieces/hour between a skilled worker and an apprentice. Basically, since compression and cutting are done by machines, there is no quality standard for the evaluation of the workers itself.

The sample size is one skilled worker and one apprentice. The reason for this is not only that the number of workers in charge of the process in this case is very limited, but also based on the idea that "knowledge" is inherent within the individual and can only be understood in its essence when it is discussed in that context [17].

The entire procedure is following four Steps:

1. Set the delivered parts in the compressor and inspect the length after compression (main task 1 as shown in Fig. 2).
2. Cut burrs off the inspected parts with a cutting machine and cut off burrs by hand (Main Task 2 as shown in Fig. 3).
4. Picking up the parts to be carried in until just before the compressor of Step 1 is started.
Hardware and software used in the experiment are Video camera: Ricoh 360°THETA V, Eye tracking sensor device: HTC VIVE Pro Eye VR Headset, Software: Tobii Technology's Tobii Pro Labo, and PC to run the device: HP Desktop PC Omen HP870-280jp. The fixation is defined under gaze within 30°/sec.

When measuring the gaze of skilled workers and newcomers, it is noted that the video to be measured with eye tracking is not exactly the same as the subject's eye level or height, but is a video taken by a 360°camera from overhead. This means that even though the motion of the machine is the same, the motion of the person being imaged is not exactly the same as what the subject would perceive.

However, the above method was adopted because in this paper, instead of focusing on the human motion, we focus on which part of the target compressor or cutting machine is focused on and at what timing in the entire work process. In addition, since the target of this paper is Step 3 and 4, those are not processes involving motion.

## 4 RESULTS

The results of the eye measurements for Step 3 and 4 respectively are described as below.

With regard to Step 3, the skilled worker took 44.063 seconds to 49.370 seconds (6.307 seconds in total) from the start of the Steps, while the apprentice took 45.859 seconds to 50.797 seconds (4.938 seconds in total). The gaze during this period is shown in the Figs. 4 and 5, where the numbers in the circles indicate the order of gaze. It can be seen that skilled workers gaze more than apprentice, both in terms of the number of gazes and the range (location) of gazes. On the other hand, the apprentices only gazes at one place, and his gaze is hardly fixed.

For the Step 4, the skilled worker's time was 49.370 to 52.308 seconds (2.938 seconds in total) and the apprentice's time was 50.797 to 53.801 seconds (2.004 seconds in total). It can be seen that the skilled worker and apprentice both looked at the compressor used in Step 1 rather than Step 3 since authors observed it was just before Step 1. Even so, the number of gazes and the range (location) of gazes were greater for the skilled worker than for the apprentice. The skilled worker's gaze is entirely focused on the compressor, while the newcomer's is not.

## 5 DISCUSSION

What can be said in general, in both Step 3 and 4, we found that the viewpoints of skilled worker and apprentice differed greatly. In Step 3, the apprentice was not looking at anything, perhaps because he might be relieved to finish the work and get it over with, while the skilled worker was looking at the area around the compressor in Step 1. Particularly in Step 3, we can see that the eye is gradually moving to the area around the work below the compressor. In step 4, both skilled worker and apprentice were found to be concentrated in the lower part of the compressor. The degree of concentration was not comparable to that of skilled worker, though, for apprentice.

### 5.1 STEP 3 in details

Details of Step 3 implies the followings. Since there is a large difference in gaze between skilled worker and apprentice in Step 3, the flow of gaze of the benchmark skilled worker was re-expressed as shown in Fig. 8, since the apprentice must have something to learn. The numbers in the figure indicate the order of gaze.

The line of the gaze is through three locations, i.e., Switch, Compressor, Measuring Equipment, which is really a work process itself. This may have been a way of following his gaze and recollecting the memory of his work-flow (when we interviewed him, he seemed to be in a somewhat unconscious state as it was only a moment for him). Also, by looking at each of the three locations, the relative positions of each will be stored in his working memory, which will speed up the process. This is more evident in Step 4.

At the time of ① to ④ in Fig. 8, it was after the work process using the burr cutting machine, so was considered it to be a warm-up for imagining the next Step 1 in one's mind, and for this purpose, it was a process of gradually focusing the eyes on the above mentioned three objects.

In particular, focusing on ⑥ and ⑦, the location here is to check the length of the part during less than just one second within the Step 1, which is a very important point of sub-task that influences Step 2 [14]. Through this experiment, it was found that in this Step 3, the skilled worker already cared about the important part of Step 1 without missing it. In Step 4, as shown in Fig. 6,it can be seen that the participants pay attention to the main processes (④~⑧ in Fig. 8) among ①~⑧, which turn out to be that learning through past experience is linked to one's working memory. It can be considered that awareness, which was initially bottom-up approach, has shifted to a top-down approach in this subtask.

When it comes to focusing on the time for the Step 3 and 4, i.e., the remaining time before Step 1 begins, it can be inferred that the time required by the skilled worker is only about 10 seconds, and during this time (i.e., in time for the next process), he anticipates the next task based on his memory of the work process and the location of the subtasks that are important points. In other words, it is important to anticipate the following three things before starting the Steps: 1) the work Step 2) the key positions, and 3) the lead-time until the next task.

As a reorganization of Fig. 1, we found three important factors. First, it is one's memory of the work process and the location of the work object based on the length of experience. Second, it is one's perception of the work process and object from one's memory during the remaining time. Third, it is a decision making and selection about where to look is based on one's memory and residual time of the work process and placement of machines, in addition to general environmental factors such as brightness and the layout of the work area.

### 5.2 Visual Search Model

We will now attempt to model Visual Search with an ordinal logistic model. Now, if we denote the set of points to be gazed at in Step 3 and 4 as ${Y_i}$ , then ${Y_i} = \{ 1,2,3,j, \ldots ,J\}$ . The dependent variable of ${Y_i}$ can be expressed by the following equation (1), where ${U_i}$ is the error.

$${Y_i} = {\beta _0} + {\beta _1}{x_1} + + {\beta _2}{x_2} + {\beta _3}{x_3} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q} + {U_i}\,\,\,\,\,\,i = \{ 1,2,3, \ldots ,n\}$$
(1)

Now, if ${Y_i}$ makes a choice of $j$ , i.e., ${Y_i} = j$ , then considering the threshold mecanism that is divided into J intervals when it comes to the choices of the points to be gazed at, we have (2).

$${K_{j - 1}} < {y_i}^{\rm{*}} < {K_j}\,\,\,\,({K_0} = - \infty ,{\rm{\ }}{K_1} = \infty )$$
(2)

Here, the followings are based on the assumption that the partial regression coefficient β is the same for all models, and that the only things that differ for each model are the intercept and residuals.

\begin{eqnarray} &&{Y_i} = 1\,\,:\,{K_0} - \left( {{\beta _0} + {\beta _1}{x_1} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q}} \right) < {U_1} < {K_1}\nonumber\\ &&\qquad - \left( {{\beta _0} + {\beta _1}{x_1} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q}} \right) \end{eqnarray}
(3)
\begin{eqnarray} && {Y_i} = 2\,:\,\,{K_1} - \left( {{\beta _0} + {\beta _1}{x_1} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q}} \right) < {U_2} < {K_2}\nonumber\\ &&\qquad - \left( {{\beta _0} + {\beta _1}{x_1} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q}} \right) \end{eqnarray}
(4)
\begin{eqnarray} &&{Y_i} = 3\,\,:\,\,{K_2} - \left( {{\beta _0} + {\beta _1}{x_1} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q}} \right) < {U_3} < {K_3}\nonumber\\ &&\qquad - \left( {{\beta _0} + {\beta _1}{x_1} + \ldots + {\beta _j}{x_j} + \ldots + {\beta _q}{x_q}} \right) \end{eqnarray}
(5)

Let's consider the case where only the option ${Y_i} = \{ 1,2,3\}$ is selected, i.e., there are three types of gazing targets as shown in Fig. 8, the warm-up part related from ① to ④, the machines, and others. The image in Fig. 9 for the skilled worker and Fig. 10 for the apprentice would be the same.

The probability ${Q_{ij}}$ . (=0 to 1) that ${Y_i}$ takes on some value j is as following equation (6).

$${Q_i} = Q\left( {{Y_{i = j}}{\rm{|}}{x_i}} \right) = Q(\left( {{Y_i} < j{\rm{|}}{x_i}} \right) - Q(\left( {{Y_i} < j - 1{\rm{|}}{x_i}} \right)$$
(6)

Then the generalized order selection probability density function is described as equation (7).

$$f\left( {{Y_i}{\rm{|}}{x_i}:{\rm{\ }}{\beta _1},{K_1},{K_2},{K_3},, \ldots ,{K_{J - 1}}} \right) = {({Q_{i1}})^{di1}}{({Q_{i2}})^{di2}} \ldots {({Q_{iJ}})^{dij}} = \,\,\,{d_{ij}} = \{ 0,1\}$$
(7)
Van der Lans, R. [20] explained that the choice of location to gaze depends on individual characteristics and location information which represents features such as size, color, and brightness included in the AOI (Area of Interest). These variables involved as independent variables. These variables can be applied to the above equation. However, based on the previous discussion, the following composite function (8) is considered to predict the decision of selective attention that depends on the memory of the work process as an anticipation (m=1), the memory of the position of the buttons of the compressor (m=2), and the remaining time γ before the start of Step 1, in addition to the individual characteristics.
$$f\left( {{Y_i}} \right) = \mathop \prod \limits_{i = 1}^J {\left( {{Q_{ij}}} \right)^{dij}}\mathop \prod \limits_{m = 1}^2 Q\left( {{Y_i}|{r_m},{{\mathop \sum \nolimits}_m}} \right)$$
(8)

The above model will be experimentally verified by collecting data in future cases. Here we will discuss the independent variables in the visual search model. The choice of location depends on the individual characteristics and location information, where the former should be ${x_i}$ , which are, for instance, one's intuition, the length of experience from the past, and the fact that the start button of a cutting machine must be pressed with both hands for safety, etc. The latter refers to where the functional positions are concretely on both the compressor and cutting machine, as well as where the backets are for parts before and after machining. These sorts of knowledge included in (7) as general knowledge in doing the work. However, it does not include what we should think about in our minds during the preparation phase in Step 3 and 4. That is why we are trying to visualize Step 3 and 4 in this paper.

The area that should be the AOI depends on the left term of (8), but as for the worker's speed, it depends on the second term on the right term of (8). In other words, if both the memory of work process (m=1) and the memory of button's position (m=2) are missing, the work speed will decrease because the button must be found. This work speed is an important indicator for managers because it influences their assessment of the work efficiency of either skilled workers or apprentices. Prediction should be done within the remaining time γ until the next cycle, and it can be called preparatory process in the mind. Since it is easy to imagine that this preparatory process affects the work efficiency, let us show the relationship between γ and m, which is illustrated as shown in Fig. 11.

The meaning of showing (a) and (b) in Fig. 11 is to illustrate a certain range of variation in the number of workers due to many independent variables as shown in equation (8). It will be easy to understand that the shape of the graph is characterized by a right shoulder rise. The reason for the upward convexity of the graph is that m is considered to be learning over time, and when γ is defined as Probability of Targeted Detection by time, the graph becomes convex according to [3] (Fig.4 .8 in page 80). It can be inferred that the probability of gaze selection to the appropriate location should increase as if from Fig. 10 to Fig. 9 as learning progresses.

## 6 CONCLUSION

Display In this paper, we have explored and analyzed the areas where selective attention is paid behind the cyclic manufacturing process, between the end of work and the start of work. As a result, it was visualized that skilled workers make careful preparations with attention effort even in the preparation stage.

In terms of skill transfer, we found that it is important not only to roughly think about the next Steps, but also to look at, prepare for, and predict the next work position and following Steps.

The limitation of this paper is that it is only one process of manufacturing and the number of subjects is limited. On the contrary, it is difficult to obtain a large number of sample data because there is few people to replace them. Rather, it is the work process under such a difficult environment that makes it a management issue how to ensure the quality and improve the work efficiency.

## 7 HISTORY DATES

Received November 2019; revised August 2020; accepted December 2020

## ACKNOWLEDGMENTS

We would like to express our gratitude to all who co-operated with us. We particularly thank Mr. Tano, president of Japan Pipe System Corp. for supporting our experiments. This research was supported by Chuo University.

Grant Sponsor: Japan Society for the Promotion of Science (JSPS). Grant no: 17K03872 and 19K03062.

## REFERENCES

• Bojko Agnieszka. 2006. Using Eye Tracking to Compare Web Page Designs: A Case Study. Journal of Usability Studies, 3(1), 112-120.
• Masao Asaoka. 2005. Ugoki no Mohoo to Image Training (in Japanese). Journal of Biomechanism of Japan, 29(1), 31-35.
• Paul Atkinson. 2018. Invisible Rhythms: Tracking Aesthetic Perception in Films and the Visual Arts, in Dwyer, T., Perkins, C., Redmond, S., and Sita, J. (eds.): Seeing into Screens – Eye Tracking and the Moving Image. Bloomsbury Academic, 28-43.
• Tom N. Cornsweet. 1956. Determination of the stimuli for involuntary drifts and saccadic eye movements, Journal of　the Optical Society of America, 46(11), 987-993.
• Andrew T. Duchowski. 2017. Eye Tracking Methodology –Theory and Practice (3rd ed.). Springer.
• Elwyn Edwards. 1972. Man and Machine: Systems for Safety. In Proceedings of British Airline Pilots Association Technical Symposium, British Airline Pilots Association, London, 21-36.
• Martin Gene Fennema, & Don N. Kleinmunts. 1995. Anticipation of effort and accuracy in multiattribute choice. Organizational Behavior & Human Decision Processes, 63, 21-32.
• Frank H. Hawkins. (1987). Human Factors in Flight. Aldershot, UK: Gower Publishing Company Ltd.
• Geoffrey Ho, Charlers T. Scialfa., Jeff K. Caird., and Trevor Graw. 2001. Visual search for traffic signs: The effects of clutter, luminance, and aging. Human Factors, 43(3), 194-207.
• Emily Higgins, Mallorie Leinenger, and Keith Rayner. 2014. Eye movements when viewing advertisements. Frontiers in Psychology, 5, 1-15.
• Kenneth Holmqvist, Marcus Nyström, Richard Andersson, Richard Dewhurst, Halszka Jarodzka, and Joost van de Weijer. 2011. Eye Tracking – A Comprehensive Guide to Methods and Measures. Oxford: Oxford University Press.
• Akiko Kimura, Makiko Tada, Tadashi Uozumi, & Akihito Goto. 2017. Teaching Method of Technique to Make the Braiding. In Stephan Trzcielinski (ed.), Advances in Ergonomics of Manufacturing: Managing the Enterprise of the Future, Advances in Intelligent Systems and Computing 606, Springer, 310-321.
• Susana Martinez-Conde, Stephen L. Macknik, David H. Hubel. 2004. The role of fixational eye movements in visual perception, Nature Review Neuroscience, 5(3), pp. 229-240.
• Jun Nakamura, Sanetake Nagayoshi, and Nozomi Komiya. 2021. Cognitive Biases as Clues to Skill Acquisition in Manufacturing Industry. Procedia Computer Sciencece, 192, 1705-1712.
• Peter Stüttgen, Peter Boatwright, and Robert T. Monroe. 2012. A Satisficing Choice Model. Marketing Science, 31(6), 878-899.
• Masaki Suwa. 2005. Metacognitive Verbalization as a Tool for Acquiring Embodied Expertise. Journal of the Japanese Society for Artificial Intelligence, 20(5), 525-532.
• Masaki Suwa, Koichi Hori. 2015. Ichininshou kenkyu no susume: Chinou kenkyu no atarasii choryuu [in Japanese]. Kindai Kagaku sha Co.,Ltd.
• Christopher D. Wickens, John D. Lee, Yili Liu, and Sallie E. Gordon Becker. 2004. An Introduction to Human Factors Engineering, (2nd ed.). NJ: Pearson Education, Inc.
• Part Wright, Ann Lickorish, & Robert Milroy. 2000. Route choices, anticipated forgetting, and interface design for on-line reference documents. Journal of Experimental Psychology: Applied, 6(2), 158-167.
• Ralf van der Lans, and Michel Wedel. 2017. Eye Movements During Search and Choice, in Berend Wierenga, and Ralf van der Lans (eds.) Handbook of Marketing Decision Models, International Series in Operations Research & Management Science 254, Springer, 331-359.
• Gregory Vial. 2019. Understanding digital transformation: A review and a research agenda. Journal of Strategic Information Systems, 28(2), 118-144.
• Hong Xiao, BalaAnand Muthu., Seifedine Nimer Kadry. 2020. Artificial Intelligence with robotics for advanced manufacturing industry using robot-assisted mixed integer programming model. Intelligent Service Robotics, https://doi.or/10.1007/s11370-020-00326-7.
• Alfred L. Yarbus. 1967. Eye Movements and Vision (Translated into English by Haigh). New York, Plenum Press.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

MISNC2021, November 15–17, 2021, Bergen, Norway

© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9601-1/21/11…\$15.00.
DOI: https://doi.org/10.1145/3504006.3504009