A Gaze-Based Web Browser with Multiple Methods for Link Selection

Matteo Casarini, University of Pavia, matteo.casarini02@universitadipavia.it
Marco Porta, University of Pavia, Italy, marco.porta@unipv.it
Piercarlo Dondi, University of Pavia, piercarlo.dondi@unipv.it

This paper presents a gaze-based web browser that allows hands-free navigation through five different link selection methods (namely, Menu, Discrete Cursor, Progressive Zoom, Quick Zoom, and Free Pointing) and two page scrolling techniques. For link selection, the purpose of this multi-approach solution is two-fold. On the one hand, we want users to be able to choose either their preferred methods or those that, in each specific case, are the most suitable (e.g., depending on the kind of link to activate). On the other hand, we wanted to assess the performance and appreciation level of the different approaches through formal tests, to identify their strengths and weaknesses. The browser, which is conceived as an assistive technology tool, also includes a built-in on-screen keyboard and the possibility to save and retrieve bookmarks.

CCS Concepts:Human-centered computing → Accessibility systems and tools; Interaction devices;Human-centered computing → Interactive systems and tools; User studies;Social and professional topics → People with disabilities; • Human-centered computing → User interface design;

Keywords: eye tracking, gaze interaction, web browsing

ACM Reference Format:
Matteo Casarini, Marco Porta, and Piercarlo Dondi. 2020. A Gaze-Based Web Browser with Multiple Methods for Link Selection. In Symposium on Eye Tracking Research and Applications (ETRA '20 Adjunct), June 02-05, 2020, Stuttgart, Germany. ACM, New York, NY, USA 8 Pages. https://doi.org/10.1145/3379157.3388929

1 INTRODUCTION

Eye tracking devices have greatly improved recently, becoming more precise and, especially, less expensive than in the past. This has opened the door to a range of new application possibilities for gaze-based interaction, in different contexts [Duchowski 2018]. However, it is still in the field of assistive technologies that eye tracking communication can find its best expression [Majaranta et al. 2011], allowing people with severe motor impairments to use the computer and connect with the world. The current availability of cheap eye trackers is a not to be missed opportunity for gaze-based assistive interfaces to become more widespread, at the service of as many people as possible.

Web browsing is undoubtedly one of the most common activities we perform every day, for both leisure and professional purposes. By virtue of the idea of “universality” that has characterized the World Wide Web since its birth, it is fundamental that everybody can easily access the online world, overcoming physical limitations. Gaze-based web browsing goes in this direction, in the wake of accessibility as defined by the World Wide Web Consortium.

In eye-controlled web browsing, link selection is one of the most critical problems [Ashmore et al. 2005; Kumar et al. 2007; Porta and Ravelli 2009]. Pointing the right link may be a challenging task, especially when links are small and/or are located close each other. Therefore, solutions are needed that allow the motor-impaired user to comfortably perform this simple but essential operation.

In this paper, we present a gaze-driven web browser1 that, besides providing basic web surfing functionalities, offers five different methods for link selection, namely Menu, Discrete Cursor, Progressive Zoom, Quick Zoom, and Free Pointing. While these methods are not completely new, as they have already been used either for web surfing or for general gaze interaction, they have never been integrated into a single browser. This multi-approach solution for link selection not only allows users to choose the methods they like more, but it also allows them to select the most suitable approach depending on the specific kind of link (e.g., textual link, image link, link in a list, etc.). However, this paper has also another main purpose: to evaluate the performance of the five link selection methods, both by means of formal tests and through an assessment of their appreciation by users.

The article is structured as follows. Section 2 shortly presents some main works related to web browsing through eye tracking and gaze-based selection of small interface elements. Section 3 describes the developed web browser and the five link selection methods. Section 4 illustrates the carried out experiments and the obtained results. Section 5, lastly, draws some conclusions and provides hints for future research.

2 RELATED WORK

One of the first attempts to implement a gaze-based web browser used an ordinary camera to track basic eye movements [Abe et al. 2008]. WeyeB [Porta and Ravelli 2009] and GazeTheWeb [Kumar et al. 2017; Menges et al. 2017] are instead examples of pure gaze-driven solutions exploiting eye tracking devices. Other authors preferred a hybrid approach, where web surfing is performed through a combination of gaze and voice input [Hakkani-Tür et al. 2014; Sengupta et al. 2018]. Under another perspective, Biedert et al. [2010] developed a framework to create web applications that are natively gaze-responsive.

Regarding link selection, different approaches have been considered. For example, Porta and Ravelli [2009] used ocular gestures to trigger the display of a menu listing the links contained in the observed page area. Penkar et al. [2013] analyzed four methods for link selection, namely “simple fixation”, “single confirm” through a confirmation button, “multiple confirm” through a list of confirmation buttons, and “radial confirm”, exploiting confirmation buttons arranged radially. The same authors [Lutteroth et al. 2015] also developed an improvement of the “multiple confirm” technique where color is used as a visual feedback. Vazquez-Li et al. [2016] employed a “predictive link” approach. Kumar et al. [2017] used screen magnification to continuously increase the size of the page, centered on the user's gaze position.

The Menu link selection method we have implemented in our browser (Section 3.1.1) has been mainly inspired by the works by Porta and Ravelli [2009] and Penkar et al. [2013]. However, in our solution the menu is not triggered by eye gestures and links are displayed in a big list rather than being presented as buttons. The Progressive Zoom method (Section 3.1.3) is a re-implementation of the screen magnification approach by Kumar et al. [2017], while the Quick Zoom method (Section 3.1.4) is a variant with immediate enlargement of the observed area.

Considering more general selection methods, not specifically designed for web links but also suitable for this purpose, a notable solution is the famous MAGIC (Manual And Gaze Input Cascaded) approach, in which the mouse cursor is automatically moved to the closest selectable element with respect to the gaze point [Zhai et al. 1999]. Ashmore et al. [2005] proposed the use of local magnification, where only the area around the gaze point is enlarged, leaving the rest of the screen unchanged. ceCursor [Porta et al. 2010] is an eye-driven cursor whose movement in the four directions is managed through directional buttons. Combining gaze-based local zoom with manual input, the approach by Kumar et al. [2007] requires the user to hold down a keyboard's key while looking at a screen element, which causes the observed area to be enlarged.

The Discrete Cursor link selection method we have implemented (Section 3.1.2) has some similarities with the cursor described by Porta et al. [2010], that in our browser has been specialized to select links only.

Various methods for page scrolling have been developed [Kumar and Winograd 2007], both for web browsing and for general interaction with electronic documents. A common approach consists in using buttons that, when fixated for a dwell time, generate downward or upward scrolls. Buttons may be hidden and appear only when the user looks at the top or bottom parts of the screen [Porta and Ravelli 2009]. Slow scrolling can also be performed automatically, allowing to easily read the entire content of the displayed page [Kumar et al. 2017; Kumar and Winograd 2007; Porta and Ravelli 2009].

3 THE DEVELOPED WEB BROWSER

The developed gaze-driven browser (Figure 1) allows to perform the main web surfing operations, also providing ‘Back’, ‘Forward’, ‘Refresh’, and ‘Home’ buttons. Window minimization and closing is obtained through a popup window displayed by looking at the ‘_X’ button in the upper right corner.

Given the specific needs of the graphical user interface of a gaze-based browser (big buttons and big elements in general), we opted for a “new” tool instead of developing a plug-in for an existing web browser. On the other hand, as will be explained in Section 3.2, the browser exploits the open source Gecko rendering engine, which practically provides the same rendering power of Mozilla Firefox.

Figure 1: The developed web browser.

The browser includes five different link selection methods, that will be illustrated in Section 3.1. By “link selection”, in this paper we mean both the pointing of the link and the virtual “click” on it to load the associated web resource.

Two scrolling alternatives are available, one allowing to quickly scroll big portions of the page (the entire height or width of the currently visible area), through four directional buttons placed on the right side of the interface, and one smoothly moving the page up or down by looking at its lower and upper edges. Moreover, an on-screen keyboard is available (Figure 2) that can be used to type URLs or to provide text input to websites. A list of favorite pages can be managed too, allowing to add and remove preferred links.

Figure 2: On-screen keyboard.

As can be seen from Figure 1, the structure of the browser is simple. A central area, where web pages are displayed, is surrounded by buttons that are triggered when fixated for a certain dwell time (1.7 seconds in our experiments). As a visual feedback, the color of buttons slightly changes to yellow when the user's gaze is perceived on them. The buttons associated with the five link selection methods are on the left, while those on the right allow to open the on-screen keyboard and perform vertical and horizontal scrolls. The default value of all dwell times is 1.7 seconds.

3.1 Methods for link selection

The link selection methods are activated by looking at the corresponding buttons on the left. After the selection of a link has occurred, the user has to explicitly activate a method again to select another link. This way, the page can be comfortably explored without the risk of being disturbed by potential involuntary selections.

3.1.1 Menu. With the Menu selection method, the user has to fixate the desired link for a certain time. Then, a sort of big “menu” is shown containing the five links that are currently visible in the displayed portion of the page and that are the closest to the fixated point (Figure 3). The user can thus choose the right link by looking at it in the menu. This way, even if the initial fixation on the link was not correctly perceived by the eye tracker, the correct link can be anyway selected in most cases. Both text and image links are considered (Figure 4). If the desired link is not in the list, looking at a point external to the menu will make it disappear.

Figure 3: Menu method: menu displayed when looking at the “Earth” link in the list on the left.

This selection method is somewhat similar to the technique exploited by Porta and Ravelli [2009], although in that case eye gestures were used to trigger the display of the menu and graphic links were not allowed. A connection can be also recognized with the “multiple confirm” technique by Penkar et al. [2013].

Figure 4: Menu method: menu displayed when looking at the “Earth” link in the figure caption (highlighted with a red ellipse).

3.1.2 Discrete Cursor. Also with the Discrete Cursor method, the user has to initially fixate the desired link. Then, after a dwell time, a big “cursor” appears centered on the link that is the closest to the fixated point. As can be seen from Figure 5, a central transparent rectangle is surrounded by four directional buttons.

Figure 5: Discrete Cursor method: a link pointed and highlighted.

If the link pointed by the cursor is correct, the user has simply to fixate it to trigger its selection. If the link is wrong, the user can look at one of the four buttons to move the cursor in the corresponding direction. The cursor will then “jump” to the closest link that can be found along that way—i.e., its movement is “discrete”. If there is no link in the chosen direction, the cursor does not move. If the user looks anywhere in the page in a position outside the cursor, it is moved to the link that is the closest to the observed point. The position and sizes of central rectangle and directional buttons are automatically adapted to the location and size of the pointed link (Figure 6 shows a case with an image link). The cursor of this method is to some extent similar to the general ceCursor developed by Porta et al. [2010]—here specifically contextualized into web browsing, however.

Figure 6: Discrete Cursor method: pointing of an image link.

The Discrete Cursor method is particularly suitable for selecting links placed close each other (for example, within a navigation menu).

3.1.3 Progressive Zoom. The Progressive Zoom selection method gradually increases the size of the page to facilitate link pointing (similarly to the approach adopted by Kumar et al. [2017]).

To make the zoom animation start, the user has to initially fixate the desired link for a certain time (1.7 seconds in our experiments) and then continue looking at it during the progressive enlargement (Figure 7). The observed spot always remains at the center of the expanding area. Once a maximum zoom is reached, a virtual “click” is generated at the point the user is observing and the page returns to its initial size. If the click occurs on a link, the associated page is loaded.

Figure 7: Progressive Zoom method: some steps of the page enlargement process while trying to select the “Visual search” link.

This method can also be used to interact with any element of a web page, for example to select an input field in a form.

3.1.4 Quick Zoom. When the user looks at the desired link for a certain time (1.7 seconds in our tests), the Quick Zoom method immediately performs a high page magnification (seven times the original size, in our experiments), with the observed point being at the center of the visible area (Figure 8). Then, the user has to fixate the link again for a dwell time to select it. The system performs a “click” on this second detected point and brings the page to its original size. If a click was detected on a link, the related page is loaded.

Figure 8: Quick Zoom method: page enlargement while trying to select the “Visual search” link.

Like Progressive Zoom, this method is also suitable for interacting with any page element.

3.1.5 Free Pointing. With the Free Pointing method, when the user's gaze is detected on a specific spot for a certain dwell time (1.7 seconds in our experiments), the mouse cursor is moved there. If the user is satisfied with the position the cursor has reached, then he or she can perform a “click” through the Click! button placed at the right of the interface (Figure 9). Otherwise, the operation is repeated until the right position is attained.

Figure 9: Free Pointing method: mouse cursor and ”Click!” button.

Like the Progressive and Quick Zoom techniques, this method can be used with any page element, and is particularly suitable for selecting sufficiently large interface components (such as image links, buttons, or some form input fields).

3.2 Technical details of the system

The system has been implemented in C#. The open source Gecko rendering engine [Gecko 2020], developed by Mozilla and used in the Firefox web browser, has been exploited for page display and analysis. The API of Gecko allows to access and manage the HTML and CSS code of a page. The Tobii Core SDK has been used for the interaction with the eye tracker (an EyeX device in our experiments).

The interface automatically adapts itself to the size of the screen since, to calculate the position of both textual and graphic links, the width and height of their container boxes are considered.

4 EXPERIMENTS

4.1 Test design and procedure

Before implementing formal tests, while developing the browser, we run several informal trials to verify the correct functioning of the system. These sort of pilot experiments involved three testers with some prior experience with eye tracking interaction (who, subsequently, also participated in the structured experiments). Through this initial phase, we were able to identify a few issues connected with the graphical structure of the interface (e.g., the position of some buttons) and with the working of the link selection methods (e.g., best timings for the different interaction stages).

At the end of the implementation of the browser, we carried out more formal tests. The main purpose of this experimental activity was not only to evaluate the browser per se, but, also, to compare the five link selection methods.

Thirty-three able-bodied subjects participated in the tests, 17 males and 16 females, aged between 21 and 70 years (34.8 on average). Fifteen of these had already tried an eye tracking system before (although for other purposes).

The Tobii EyeX eye tracker was used (70 Hz frequency). The working principle of the system was at first presented to the tester, explaining how to use the five methods for link selection. After a short calibration procedure, the participant could try the five methods for about one minute each, starting from the Wikipedia page about eye tracking (en.wikipedia.org/wiki/Eyetracking).

A first formal test (Test 1) then started, consisting of five trials, each requiring the selection of three links. The five pages from which gaze navigation started had a similar structure and the links to be selected were in similar positions. The standard visualization was used—no page enlargement was performed. Some links were positioned within compact lists, while others were surrounded by neighbor links. To avoid possible learning effects, each of the five link selection methods was pseudo-randomly assigned to a trial/page (overall, all pages were tested with all the methods in a balanced way).

As an example, for the Solar System page (en.wikipedia.org/wiki/SolarSystem) the tester had to:

  1. Select the gravitational collapse link in the second paragraph (Figure 10);
  2. Select the What links here link in the left column of the page (Figure 11);
  3. Select the Ring system link in the main list of links (Figure 12).
Figure 10: Test 1: selection of the gravitational collapse link.
Figure 11: Test 1: selection of the What links here link.
Figure 12: Test 1: selection of the Ring system link.

The other initial Wikipedia pages were Butterfly (en.wikipedia.org/wiki/Butterfly), Volcano (.../Volcano), Weather (.../Weather), and Global Warming (.../Global_warming).

A second test (Test 2) was also performed after Test 1, consisting in the selection of five predefined links, in sequence, each using a specific method. This time, the tester used all five methods in a single web surfing session. Starting from the Wikipedia Water page (en.wikipedia.org/wiki/Water), the tester had to:

  1. Select the seas link in the second paragraph of the page (similar to the pages of step 1 in Test 1) using the Menu method ;
  2. Select the What links here link in the left column of the page (similar to the pages of step 2 in Test 1) using the Discrete Cursor method;
  3. Select the Norwegian Sea link in the main list of links in the page (similar to the pages of step 3 in Test 1) using the Progressive Zoom method;
  4. Select the Vestfjorden link in the caption of the image on the right using the Quick Zoom method (Figure 13);
  5. Select the Fjords of Nordland link from the “Categories” at the bottom of the page using the Free Pointing method (Figure 14).
Figure 13: Test 2: selection of the Vestfjorden link in the figure caption using the Quick Zoom method.
Figure 14: Test 2: selection of the Fjords of Nordland link using the Free Pointing method.

In both Test 1 and Test2, errors were defined as follows. For the Menu and Discrete Cursor methods, an error occurred when the tester selected a link that was not the correct one. For the Progressive Zoom and Quick Zoom methods, an error was detected if, at the end of the zoom procedure, the “click” was performed outside the (correct) link. For the Free Pointing method, lastly, an error happened if the tester succeeded in activating the requested link with more than ten cursor movements (from six to ten movements it was instead considered a “partial success”). During the tests, we also informally verified that participants could perform page scrolling, freely choosing any of the two available approaches.

The time necessary to activate a link was measured from the moment a selection method was picked (by pressing its button through the gaze) to when the correct link selection occurred. At the end of the tests, participants were asked to rank the five methods with values from 1 (the worst) to 5 (the best). At last, an overall rating of the gaze-based web surfing experience was also asked, with values from 1 (not appreciated at all) to 10 (very appreciated).

4.2 Results

In the analysis, the data of Test 1 and Test 2 have been considered together.

Figure 15 shows means and standard deviations of the times required to select a link (for Free Pointing, also partial successes are considered). A repeated measures ANOVA with Greenhouse-Geisser correction showed that there was a significant effect of the selection method: F(1.9, 59.33) = 85.1, p < .001 (the normality assumption for data distributions was assessed through the Kolmogorov-Smirnov test).

Figure 15: Mean times required to perform a correct link selection with the five methods.

Using the Bonferroni technique to discover which pairs of methods were actually characterized by significantly dissimilar times, we found that the time of Quick Zoom cannot be considered really different from the times of Menu and Discrete Cursor; all other differences are instead statistically relevant.

Menu and Quick Zoom are the fastest methods. Discrete Cursor has a variability correlated to the number of “shifts” of the cursor to reach the correct link. Progressive zoom requires a fixed time necessary to achieve a full enlargement. Free Pointing is characterized by the highest time, due to the several attempts that are generally necessary to aim at the correct link. As stressed by the high standard deviation, there is also a high variability in user performance (depending on how precisely the specific user's gaze is detected).

Figure 16 shows the success rates with the five link selection methods. For Free Pointing, we distinguish between actual successes (lower part of the bar in the histogram) and partial successes (upper part).

Figure 16: Mean selection success rates for the five methods (for Free Pointing, the lower part of the bar indicates actual successes, while the upper part identifies partial successes).

Given the few error cases (especially for the first four methods), the distributions of success rates were not normal. Using non-parametric statistics, a Friedman test showed that there was a significant effect of the method (χ2(4) = 27.28, p < .001), with the only significant difference being between Free Pointing and the other four methods. The median scores for the five methods were, respectively, 100%, 100%, 100%, 100%, and 75%.

The mean ranks in Figure 17 clearly show that Menu is the most appreciated method by testers, while Free Pointing is the least valued approach. Since data distributions for the Menu and Free Pointing methods were not normal, we could not apply a global analysis based on parametric statistics. A Friedman test showed that there was a significant effect of the method (χ2(4) = 48.32, p < .001). The median scores for the five methods were, respectively, 5, 3, 3, 3, and 1. Considering only Discrete Cursor, Progressive Zoom, and Quick Zoom, a repeated measures ANOVA showed that there was not a significant effect of the selection method: F(2, 64) = 1.98, p > .05.

Figure 17: Mean ranks based on the preferences of testers.

The mean global evaluation of the gaze-based browsing experience, in a 1-10 range, was 8.67, which is good considering that almost all testers had no prior experience with eye tracking interaction.

5 CONCLUSIONS

In this paper we have presented a gaze-controlled web browser characterized by five different link selection methods. The aim of this solution is to allow the user to choose, each time, the preferred or most suitable approach to select a specific link.

The performed experimental tests, carried out on typical web pages containing small textual links, have shown that Menu, Discrete Cursor, and Quick Zoom are the fastest selection methods, also characterized, together with Progressive Zoom, by the highest success rates. Free Pointing, as could be expected, is the most problematic approach, rather slow and error-prone—in general, it is not easy to precisely select small elements with the gaze. Nonetheless, this simple solution can be suitable for big links, such as image links. Looking at the testers’ preferences (ranks), we can state that Menu and Free Pointing are decidedly the most and least appreciated methods, respectively, while Discrete Cursor, Progressive Zoom, and Quick Zoom are practically at the same level (despite some small differences).

The positive evaluation of the general gaze-based browsing experience provided by testers, most of whom had never used an eye tracking system before, is a satisfying result. However, the outcomes of our experiments are limited to the able-bodied participants in our tests. In future investigations, we will try to involve also motor impaired testers, who are the actual target of the developed browser. Future studies will also consider tests with different parameter values (e.g., dwell times, number of menu items, zoom speeds, sizes of graphical elements, etc.).

ACKNOWLEDGMENTS

We thank the volunteer testers who participated in the user study.

REFERENCES

FOOTNOTE

1The browser is freely available for download; visit the page https://vision.unipv.it/research/gbbrowser/

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

ETRA '20 Adjunct, June 02–05, 2020, Stuttgart, Germany

© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7135-3/20/06…$15.00.
DOI: https://doi.org/10.1145/3379157.3388929