US20170262051A1 - Method for refining control by combining eye tracking and voice recognition - Google Patents

Method for refining control by combining eye tracking and voice recognition Download PDF

Info

Publication number
US20170262051A1
US20170262051A1 US15/066,387 US201615066387A US2017262051A1 US 20170262051 A1 US20170262051 A1 US 20170262051A1 US 201615066387 A US201615066387 A US 201615066387A US 2017262051 A1 US2017262051 A1 US 2017262051A1
Authority
US
United States
Prior art keywords
screen
area
objects
user
eye tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/066,387
Inventor
Martin Henrik Tall
Jonas Priesum
Javier San Agustin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meta Platforms Technologies LLC
Original Assignee
Eye Tribe ApS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eye Tribe ApS filed Critical Eye Tribe ApS
Priority to US15/066,387 priority Critical patent/US20170262051A1/en
Assigned to THE EYE TRIBE reassignment THE EYE TRIBE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PRIESUM, JONAS, SAN AGUSTIN, Javier, TALL, MARTIN HENRIK
Assigned to THE EYE TRIBE APS reassignment THE EYE TRIBE APS AFFIDAVIT OF ASSIGNEE TO CORRECT THE ASSIGNEE'S NAME RECORDED AT REEL/FRAME: 037948/0234 Assignors: THE EYE TRIBE
Assigned to FACEBOOK, INC. reassignment FACEBOOK, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: THE EYE TRIBE APS
Assigned to THE EYE TRIBE APS reassignment THE EYE TRIBE APS CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNORS SIGNATURES AND DATES PREVIOUSLY RECORDED AT REEL: 037948 FRAME: 0234. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: PRIESUM, JONAS PHILIP, SAN AGUSTIN, Javier, TALL, MARTIN HENRIK
Publication of US20170262051A1 publication Critical patent/US20170262051A1/en
Assigned to THE EYE TRIBE reassignment THE EYE TRIBE CORRECTIVE ASSIGNMENT TO CORRECT THE SPELLING OF THE THIRD ASSIGNOR'S NAME PREVIOUSLY RECORDED ON REEL 042793 FRAME 0300. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: PRIESUM, JONAS, SAN AGUSTIN LOPEZ, JAVIER, TALL, MARTIN HENRIK
Assigned to FACEBOOK TECHNOLOGIES, LLC reassignment FACEBOOK TECHNOLOGIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FACEBOOK, INC.
Assigned to META PLATFORMS TECHNOLOGIES, LLC reassignment META PLATFORMS TECHNOLOGIES, LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: FACEBOOK TECHNOLOGIES, LLC
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • the present invention relates to a system control using eye tracking and voice recognition
  • GUIs graphical user interfaces
  • Objects which may include images, words, and alphanumeric characters can be displayed on screens; and users employ cursor-control devices (e.g. mouse or touch pad) and switches to indicate choice and selection of interactive screen elements.
  • cursor-control devices e.g. mouse or touch pad
  • switches to indicate choice and selection of interactive screen elements.
  • systems may use a touch-sensitive screen whereby a user identifies and selects something by touching its screen location with a finger or stylus. In this way, for example, one could select a control icon, such as “print,” or select a hyperlink.
  • Cursor control and touch-control panels are designed such that users physically manipulate a control device to locate and select screen items.
  • There are alternative means for such control that do not involve physically moving or touching a control subsystem.
  • One such alternative makes use of eye tracking where a user's gaze at a screen can be employed to identify a screen area of interest and a screen item for interactive selection.
  • Another alternative makes use of voice recognition and associates recognized words with related items displayed on a screen.
  • eye tracking nor voice recognition control, on their own, are as precise with regard to locating and selecting screen objects as, say, cursor control or touch control. In the case of eye tracking, one is often limited in resolution to a screen area rather than a point or small cluster of points.
  • control methodologies may employ zooming so as to limit the number of screen objects and increase the distance between them, as in eye tracking control; or require iterative spoken commands in order to increase the probability of correct control or selection interpretation.
  • the method herein disclosed and claimed enables independently implemented eye tracking and voice recognition controls to co-operate so as to make overall control faster and/or more accurate.
  • the method herein disclosed and claimed could be employed in an integrated control system that combines eye tracking with voice recognition control.
  • the method herein disclosed and claimed is applicable to locating and selecting screen objects that may result from booting up a system in preparation for running an application, or interacting with a server-based HTML page aggregate using a client user system (e.g. interacting with a website via the Internet).
  • this method in conjunction with eye tracking and voice recognition control subsystems would provide enhanced control over the interaction of screen-displayed objects irrespective of the underlying platform specifics.
  • the method herein disclosed and claimed uses attributes of eye tracking to reduce the ambiguities of voice-recognition control; and uses voice recognition to reduce the ambiguities of eye tracking control.
  • the result is control synergy; that is, control speed and accuracy that exceeds that of eye tracking or voice recognition control on each's own.
  • FIG. 1 depicts a display screen displaying non-text and textual objects.
  • the screen for example, could be any system display and control screen, such as a computer monitor, smartphone screen, tablet screen, or the like.
  • FIG. 2 depicts the screen of FIG. 1 where eye tracking control determines that the user's gaze is essentially on a non-textual object.
  • FIG. 3 depicts the screen of FIG. 1 where eye tracking control determines that the user's gaze is essentially on a screen area comprising text objects.
  • FIG. 4 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the confidence level of determining a location and selection, and, therefore, the accuracy.
  • FIG. 5 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining a location and selection, and, therefore, the accuracy.
  • FIG. 6 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining the selected word in a group of words by associating the interpreted word with its occurrence in a smaller screen area determined as the user's gaze screen area.
  • GUIs have become the primary interaction mechanism between systems and users.
  • displayed objects on a screen which could be images, alphanumeric characters, text, icons, and the like
  • the user makes use of a portion of the GUI that enables the user to locate and select a screen object.
  • the two most common GUI subsystems employ cursor control devices (e.g. mouse or touch pad) and selection switches to locate and select screen objects.
  • the screen object could be a control icon, like a print button, so locating and selecting it may cause a displayed document file to be printed. If the screen object is a letter, word, or highlighted text portion, the selection would make it available for editing, deletion, copy-and-paste, or similar operations.
  • Today many devices use a touch-panel screen which enables a finger or stylus touch to locate and/or select a screen object. In both cases, the control relies on the user to physically engage with a control device in order to locate and select a screen object.
  • cursor control With cursor control, one is usually able to precisely locate and select a screen object. Sometimes one has to enlarge a portion of the screen to make objects larger and move them farther apart from one another in order to precisely locate and select an intended screen object. This zooming function is more typical of finger-touch controls where a finger touch on an area with several small screen objects is imprecise until zooming is applied.
  • a GUI could also serve to enable location and selection of screen objects without requiring physical engagement.
  • a GUI that makes use of eye tracking control would determine where on a screen the user is gazing (e.g. location) and use some method for selection control (e.g. gaze dwell time). This would be analogous to using a mouse to move a cursor over a screen object and then pressing a button to signify selection intent.
  • Voice-recognition-based control could also serve as a control technology where physical engagement would not be required.
  • a screen of objects would have a vocabulary of spoken words associated with the objects, and when a user says a word or phrase, the control system recognizes the word and associates it with a particular screen object. So, for example, a screen with an object that is a circle with a letter A in its center could be located and selected by a user who says “circle A,” which may cause the GUI system to highlight it, and then saying “select,” which would cause the GUI system to select the object and perhaps remove the highlighting. Clearly, if there were many objects on a screen, some having the same description, saying “circle” where there are five circles of various size and color would be ambiguous. The system could prompt the user for further delineation in order to have a higher confidence level or higher probability estimation.
  • the tradeoff in using eye tracking or voice-recognition control is eliminating the need for physical engagement with a pointing/selecting device or the screen, but accepting less precise location and selection resolution.
  • a type-selecting cursor is smaller than an alphanumeric character standing alone or immersed in a word. So, if one is fixing a typographical error, one can select a single letter and delete or change it. Using touch control, the area of finger or stylus touch is typically larger than a cursor pointer. It would be difficult to select a letter immersed in a word for similar typographical error correction. One may have to make several pointing attempts to select the correct letter, or expand (i.e. zoom) the word to larger proportions so that the touch point can be resolved to the single, intended letter target.
  • font sizes and non-textual object dimensions will affect the control resolution, but in general, technologies that do not require physical engagement cannot accommodate dense text having small characters and non-text objects having small dimensions without iterative zooming steps.
  • the method herein disclosed and claimed makes use of eye tracking and voice-recognition control technologies in conjunction to, in effect, improve the accuracy of locating and selecting screen objects using either control technology on its own.
  • the method applies to any system having displayed objects whereby users interact with said system by locating and selecting screen objects and directing the system to carry out some operation or operations on one or a plurality of screen objects.
  • Such systems can comprise combinations of hardware, firmware and software that, in concert, support displaying, locating, selecting and operating on displayed objects.
  • the method may comprise interacting with system hardware and/or software as part of an integrated control subsystem incorporating eye tracking and voice-recognitions controls; or as part of a system in which separate eye tracking and voice-recognition control subsystems can interact.
  • the method invention herein disclosed and claimed should therefore not be limited in scope to any particular system architecture or parsing of hardware and software.
  • Eye tracking technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of determining approximately where a user's eye or eyes are gazing at some area of a display screen.
  • the eye tracking technology or subsystem may also be capable of determining that a user has selected one or more objects in the gazed area so located.
  • An object could be an icon or link that initiates an operation if so selected.
  • Voice-recognition technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of recognizing a user's spoken word or phrase of words and associating that recognized word or phrase with a displayed object and/or an operational command.
  • FIG. 1 depicts a display of objects on a screen.
  • Objects consist of text objects, such as alphanumeric characters, words, sentences and paragraphs; and non-text objects which comprise images, line art, icons, and the like. This drawing is exemplary and should not be read as limiting the layout and content of objects on a screen.
  • an eye tracking control subsystem has determined that a user's eye is gazing at a portion of a non-text object and the gazed area is defined by the area circled by 201 .
  • FIG. 3 depicts the screen of FIG. 1 where an eye tracking control subsystem has determined that a user's eye is gazing at a portion of text objects, the area of which is circled by 301 .
  • the eye tracking subsystem could not, at that time, resolve which object in area 201 is a user's object of interest.
  • the screen objects could be enlarged such that only one object would be located in area 201 .
  • the subsequent step adds time for the sake of accuracy. It may also be the case that a first zooming attempt results in two or more objects still within area 201 . Hence, a second zoom operation may have to be done in order to determine the object of interest. Here, again, more time is used.
  • the gazed area, 301 covers a plurality of alphanumeric characters and words.
  • the eye tracking control subsystem would be unable to determine specifically which character or word is the object of interest.
  • iterative zoom operations may have to be done in order to resolve which letter or word is the object of interest. As with the non-text object case, each time a zoom operation is applied, more time is required.
  • the entire visible screen and any of its objects could be a user's object of choice.
  • the voice-recognition subsystem would first have to recognize the word “here,” then associate it with any instances of it among the screen objects. As shown in FIG. 1 , there are three instances of the word “here.” Thus, the voice-recognition subsystem would be unable to resolve the command to a singular object choice. It may have to engage in a repetitive sequence of highlighting each instance of “here” in turn until the user says “yes,” for example. This would take more time.
  • FIG. 4 shows an exemplary task flow.
  • the flow shown in FIG. 4 should not be read as limiting.
  • the flow begins 401 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly.
  • the eye tracking subsystem computes repeated screen gaze coordinates and passes them to the system. From 402 , a gazed area, G, is determined ( 403 ).
  • 404 and 405 once area G is determined, the system builds a dictionary of links, D, and vocabulary, V, for those found links in area G.
  • vocabulary V may be updated for every gaze coordinate, for every fixation, every N gaze coordinates, every T milliseconds, and so on. Steps 402 through 405 continue to refresh until a voice command is received ( 406 ). The system then recognizes the voice command based on vocabulary, V ( 407 ) and determines link L along with a confidence level of accuracy, C ( 408 ). With voice recognition, extraneous sounds coupled with a voice command, can also introduce audio artifacts that may reduce recognition accuracy.
  • the confidence level C may be compared to a threshold value, th, and if it is greater ( 409 ), then the system activates link L ( 410 ), otherwise it returns to operation ( 402 ).
  • the threshold th may take a fixed value, or it may be computed on a per-case basis depending on different factors, for example, noise in the gaze coordinates, on-screen accuracy reported by the eye tracking system, confidence level in the gaze coordinates, location of the link L on the screen, or any combination of these.
  • the system can activate the link, L, with sufficient level of confidence using fewer steps and in less time.
  • FIG. 5 shows an exemplary task flow.
  • the flow in FIG. 5 should not be read as limiting.
  • the flow begins with 501 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly.
  • the eye control subsystem repeatedly refreshes the gazed area coordinates and feeds that data to the system ( 502 ).
  • a gazed area G is determined by the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that ( 504 ).
  • a dictionary of links, D, present in area G is built ( 505 ) and a vocabulary, V, of links in the area G is built ( 506 ).
  • the voice command is recognized based on V ( 507 ) with probability P.
  • the accuracy probability P for each link may be computed ( 508 ) based on different factors, for example, the confidence level of the voice recognition C, the distance from the gaze point or a fixation to the link, the duration of said fixation, time elapsed between link being gazed upon and emission of the voice command, and the like; and the link with highest probability P may be selected. If P is larger than a threshold value ( 509 ), th, then the link, L, is activated ( 510 ), otherwise the system returns to operation ( 502 ) and waits for a new voice command.
  • the threshold value th may take a fixed value, or it may be computed on a per-case basis as explained above for operation ( 409 ). Note that in both FIGS. 4 and 5 a link is activated. In fact, these operations are not limited to links, but rather, could be applied to any interactive screen object.
  • FIG. 6 shows an exemplary task flow.
  • the flow in FIG. 6 should not be read as limiting.
  • the flow begins with the system loading and parsing the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly.
  • the system awaits a voice command.
  • the command is “select” ( 603 ).
  • a gazed area, G is determined ( 604 ) by using the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that.
  • the gazed area is as in FIG. 3 over text objects. So, the text is parsed, T, in area G and a vocabulary, V, is built ( 605 ).
  • the text object of the voice command is recognized ( 606 ).
  • a word, W is evaluated as to probability, P, ( 607 ) and compared to a threshold value ( 608 ), th. If P exceeds th, word W is selected ( 609 ).
  • Probability P and threshold value th may be computed as explained previously.
  • FIGS. 4, 5 and 6 are exemplary.
  • the entire screen of objects is reduced to those objects within a gazed area increasing the confidence or probability level without resorting to zooming operations. It is of course possible that a gazed area will still continue to have some object of interest ambiguity, but the likelihood is far lower than with using only voice-recognition control. Often the spoken word in combination with gazed area is sufficient to resolve the object of interest without any zooming operations.
  • the combination of eye tracking and voice-recognition technologies will resolve the object of interest faster than either eye tracking or voice-recognition controls applied exclusively.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention is a method for combining eye tracking and voice-recognition control technologies to increase the speed and/or accuracy of locating and selecting objects displayed on a display screen for subsequent control and operations.

Description

    TECHNICAL FIELD
  • The present invention relates to a system control using eye tracking and voice recognition
  • BACKGROUND OF THE INVENTION
  • Computing devices, such as personal computers, smartphones, tablets, and others make use of graphical user interfaces (GUIs) to facilitate control by their users. Objects which may include images, words, and alphanumeric characters can be displayed on screens; and users employ cursor-control devices (e.g. mouse or touch pad) and switches to indicate choice and selection of interactive screen elements. In other cases, rather than cursor and switch, systems may use a touch-sensitive screen whereby a user identifies and selects something by touching its screen location with a finger or stylus. In this way, for example, one could select a control icon, such as “print,” or select a hyperlink. One could also select a sequence of alphanumeric characters or words for text editing and/or copy-and-paste interactions. Cursor control and touch-control panels are designed such that users physically manipulate a control device to locate and select screen items. There are alternative means for such control, however, that do not involve physically moving or touching a control subsystem. One such alternative makes use of eye tracking where a user's gaze at a screen can be employed to identify a screen area of interest and a screen item for interactive selection. Another alternative makes use of voice recognition and associates recognized words with related items displayed on a screen. Neither eye tracking nor voice recognition control, on their own, are as precise with regard to locating and selecting screen objects as, say, cursor control or touch control. In the case of eye tracking, one is often limited in resolution to a screen area rather than a point or small cluster of points. If there is more than one screen object within or near that screen area, then selection may be ambiguous. Similarly, with a screen full of text and object choices, a voice recognition subsystem could also suffer ambiguity when trying to resolve a recognized word with a singularly related screen object or word. Thus, as a result, such control methodologies may employ zooming so as to limit the number of screen objects and increase the distance between them, as in eye tracking control; or require iterative spoken commands in order to increase the probability of correct control or selection interpretation.
  • BRIEF SUMMARY OF THE INVENTION
  • By combining eye tracking and voice recognition controls one can effectively increase the accuracy of location and selection and thereby reduce iterative zooming or spoken commands that are currently required when using one or the other control technology.
  • The method herein disclosed and claimed enables independently implemented eye tracking and voice recognition controls to co-operate so as to make overall control faster and/or more accurate.
  • The method herein disclosed and claimed could be employed in an integrated control system that combines eye tracking with voice recognition control.
  • The method herein disclosed and claimed is applicable to locating and selecting screen objects that may result from booting up a system in preparation for running an application, or interacting with a server-based HTML page aggregate using a client user system (e.g. interacting with a website via the Internet). In essence, this method in conjunction with eye tracking and voice recognition control subsystems would provide enhanced control over the interaction of screen-displayed objects irrespective of the underlying platform specifics.
  • The method herein disclosed and claimed uses attributes of eye tracking to reduce the ambiguities of voice-recognition control; and uses voice recognition to reduce the ambiguities of eye tracking control. The result is control synergy; that is, control speed and accuracy that exceeds that of eye tracking or voice recognition control on each's own.
  • BRIEF DESCRIPTIONS OF THE DRAWINGS
  • FIG. 1 depicts a display screen displaying non-text and textual objects. The screen, for example, could be any system display and control screen, such as a computer monitor, smartphone screen, tablet screen, or the like.
  • FIG. 2 depicts the screen of FIG. 1 where eye tracking control determines that the user's gaze is essentially on a non-textual object.
  • FIG. 3 depicts the screen of FIG. 1 where eye tracking control determines that the user's gaze is essentially on a screen area comprising text objects.
  • FIG. 4 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the confidence level of determining a location and selection, and, therefore, the accuracy.
  • FIG. 5 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining a location and selection, and, therefore, the accuracy.
  • FIG. 6 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining the selected word in a group of words by associating the interpreted word with its occurrence in a smaller screen area determined as the user's gaze screen area.
  • DETAILED DESCRIPTION OF THE INVENTION
  • As interactive computing systems of all kinds have evolved, GUIs have become the primary interaction mechanism between systems and users. With displayed objects on a screen, which could be images, alphanumeric characters, text, icons, and the like, the user makes use of a portion of the GUI that enables the user to locate and select a screen object. The two most common GUI subsystems employ cursor control devices (e.g. mouse or touch pad) and selection switches to locate and select screen objects. The screen object could be a control icon, like a print button, so locating and selecting it may cause a displayed document file to be printed. If the screen object is a letter, word, or highlighted text portion, the selection would make it available for editing, deletion, copy-and-paste, or similar operations. Today many devices use a touch-panel screen which enables a finger or stylus touch to locate and/or select a screen object. In both cases, the control relies on the user to physically engage with a control device in order to locate and select a screen object.
  • With cursor control, one is usually able to precisely locate and select a screen object. Sometimes one has to enlarge a portion of the screen to make objects larger and move them farther apart from one another in order to precisely locate and select an intended screen object. This zooming function is more typical of finger-touch controls where a finger touch on an area with several small screen objects is imprecise until zooming is applied.
  • A GUI could also serve to enable location and selection of screen objects without requiring physical engagement. For example, a GUI that makes use of eye tracking control would determine where on a screen the user is gazing (e.g. location) and use some method for selection control (e.g. gaze dwell time). This would be analogous to using a mouse to move a cursor over a screen object and then pressing a button to signify selection intent.
  • Voice-recognition-based control could also serve as a control technology where physical engagement would not be required. A screen of objects would have a vocabulary of spoken words associated with the objects, and when a user says a word or phrase, the control system recognizes the word and associates it with a particular screen object. So, for example, a screen with an object that is a circle with a letter A in its center could be located and selected by a user who says “circle A,” which may cause the GUI system to highlight it, and then saying “select,” which would cause the GUI system to select the object and perhaps remove the highlighting. Clearly, if there were many objects on a screen, some having the same description, saying “circle” where there are five circles of various size and color would be ambiguous. The system could prompt the user for further delineation in order to have a higher confidence level or higher probability estimation.
  • Thus, the tradeoff in using eye tracking or voice-recognition control is eliminating the need for physical engagement with a pointing/selecting device or the screen, but accepting less precise location and selection resolution. Often, as a result of the lower resolution, there may be more steps performed before the system can determine the location and selection of an object with a probability commensurate with more resolute controls, such as cursor, touch pad, or touch screen.
  • Typically, a type-selecting cursor is smaller than an alphanumeric character standing alone or immersed in a word. So, if one is fixing a typographical error, one can select a single letter and delete or change it. Using touch control, the area of finger or stylus touch is typically larger than a cursor pointer. It would be difficult to select a letter immersed in a word for similar typographical error correction. One may have to make several pointing attempts to select the correct letter, or expand (i.e. zoom) the word to larger proportions so that the touch point can be resolved to the single, intended letter target.
  • Regardless of which GUI location and selection technology one uses, font sizes and non-textual object dimensions will affect the control resolution, but in general, technologies that do not require physical engagement cannot accommodate dense text having small characters and non-text objects having small dimensions without iterative zooming steps.
  • The method herein disclosed and claimed makes use of eye tracking and voice-recognition control technologies in conjunction to, in effect, improve the accuracy of locating and selecting screen objects using either control technology on its own. The method applies to any system having displayed objects whereby users interact with said system by locating and selecting screen objects and directing the system to carry out some operation or operations on one or a plurality of screen objects. Such systems can comprise combinations of hardware, firmware and software that, in concert, support displaying, locating, selecting and operating on displayed objects. The method may comprise interacting with system hardware and/or software as part of an integrated control subsystem incorporating eye tracking and voice-recognitions controls; or as part of a system in which separate eye tracking and voice-recognition control subsystems can interact. The method invention herein disclosed and claimed should therefore not be limited in scope to any particular system architecture or parsing of hardware and software.
  • Eye tracking technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of determining approximately where a user's eye or eyes are gazing at some area of a display screen. The eye tracking technology or subsystem may also be capable of determining that a user has selected one or more objects in the gazed area so located. An object could be an icon or link that initiates an operation if so selected.
  • Voice-recognition technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of recognizing a user's spoken word or phrase of words and associating that recognized word or phrase with a displayed object and/or an operational command.
  • FIG. 1 depicts a display of objects on a screen. Objects consist of text objects, such as alphanumeric characters, words, sentences and paragraphs; and non-text objects which comprise images, line art, icons, and the like. This drawing is exemplary and should not be read as limiting the layout and content of objects on a screen.
  • With eye tracking control technology one can determine an area where a user's eye or eyes are gazing at the screen of FIG. 1. For example, in FIG. 2, an eye tracking control subsystem has determined that a user's eye is gazing at a portion of a non-text object and the gazed area is defined by the area circled by 201.
  • FIG. 3 depicts the screen of FIG. 1 where an eye tracking control subsystem has determined that a user's eye is gazing at a portion of text objects, the area of which is circled by 301.
  • In FIG. 2, if the non-text object were smaller than 201, and more than one such object were located in area 201, the eye tracking subsystem could not, at that time, resolve which object in area 201 is a user's object of interest. By engaging in a subsequent step, the screen objects could be enlarged such that only one object would be located in area 201. But the subsequent step adds time for the sake of accuracy. It may also be the case that a first zooming attempt results in two or more objects still within area 201. Hence, a second zoom operation may have to be done in order to determine the object of interest. Here, again, more time is used.
  • In FIG. 3, the gazed area, 301, covers a plurality of alphanumeric characters and words. Here, again, the eye tracking control subsystem would be unable to determine specifically which character or word is the object of interest. Again, iterative zoom operations may have to be done in order to resolve which letter or word is the object of interest. As with the non-text object case, each time a zoom operation is applied, more time is required.
  • Using a voice-recognition technology in association with FIG. 1, the entire visible screen and any of its objects could be a user's object of choice. For example, if the user said “delete word ‘here’”, the voice-recognition subsystem would first have to recognize the word “here,” then associate it with any instances of it among the screen objects. As shown in FIG. 1, there are three instances of the word “here.” Thus, the voice-recognition subsystem would be unable to resolve the command to a singular object choice. It may have to engage in a repetitive sequence of highlighting each instance of “here” in turn until the user says “yes,” for example. This would take more time.
  • In one embodiment of the invention herein disclosed and claimed, FIG. 4 shows an exemplary task flow. The flow shown in FIG. 4 should not be read as limiting. The flow begins 401 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly. In 402, the eye tracking subsystem computes repeated screen gaze coordinates and passes them to the system. From 402, a gazed area, G, is determined (403). In 404 and 405, once area G is determined, the system builds a dictionary of links, D, and vocabulary, V, for those found links in area G. Depending on the capabilities of the computing device and/or the voice recognition subsystem, vocabulary V may be updated for every gaze coordinate, for every fixation, every N gaze coordinates, every T milliseconds, and so on. Steps 402 through 405 continue to refresh until a voice command is received (406). The system then recognizes the voice command based on vocabulary, V (407) and determines link L along with a confidence level of accuracy, C (408). With voice recognition, extraneous sounds coupled with a voice command, can also introduce audio artifacts that may reduce recognition accuracy. In order to avoid incorrect selections due to extraneous sounds, the confidence level C may be compared to a threshold value, th, and if it is greater (409), then the system activates link L (410), otherwise it returns to operation (402). The threshold th may take a fixed value, or it may be computed on a per-case basis depending on different factors, for example, noise in the gaze coordinates, on-screen accuracy reported by the eye tracking system, confidence level in the gaze coordinates, location of the link L on the screen, or any combination of these. Here is a case where eye tracking technology is used to reduce the whole screen of possible objects to just those within the gazed area, G. Rather than having to iterate with repeated zoom steps, by using the eye tracking gazed area G as a delineator, the system can activate the link, L, with sufficient level of confidence using fewer steps and in less time.
  • In another embodiment, FIG. 5 shows an exemplary task flow. The flow in FIG. 5 should not be read as limiting. The flow begins with 501 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly. The eye control subsystem repeatedly refreshes the gazed area coordinates and feeds that data to the system (502). When a voice command is received (503), a gazed area G is determined by the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that (504). A dictionary of links, D, present in area G is built (505) and a vocabulary, V, of links in the area G is built (506). The voice command is recognized based on V (507) with probability P. In case multiple links are recognized, the accuracy probability P for each link may be computed (508) based on different factors, for example, the confidence level of the voice recognition C, the distance from the gaze point or a fixation to the link, the duration of said fixation, time elapsed between link being gazed upon and emission of the voice command, and the like; and the link with highest probability P may be selected. If P is larger than a threshold value (509), th, then the link, L, is activated (510), otherwise the system returns to operation (502) and waits for a new voice command. The threshold value th may take a fixed value, or it may be computed on a per-case basis as explained above for operation (409). Note that in both FIGS. 4 and 5 a link is activated. In fact, these operations are not limited to links, but rather, could be applied to any interactive screen object.
  • In another embodiment, FIG. 6 shows an exemplary task flow. The flow in FIG. 6 should not be read as limiting. The flow begins with the system loading and parsing the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly. Then, the system awaits a voice command. Here, for example, the command is “select” (603). A gazed area, G, is determined (604) by using the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that. Here, the gazed area is as in FIG. 3 over text objects. So, the text is parsed, T, in area G and a vocabulary, V, is built (605). Based on vocabulary, V, the text object of the voice command is recognized (606). A word, W, is evaluated as to probability, P, (607) and compared to a threshold value (608), th. If P exceeds th, word W is selected (609). Probability P and threshold value th may be computed as explained previously.
  • The flows shown in FIGS. 4, 5 and 6 are exemplary. In each example, the entire screen of objects is reduced to those objects within a gazed area increasing the confidence or probability level without resorting to zooming operations. It is of course possible that a gazed area will still continue to have some object of interest ambiguity, but the likelihood is far lower than with using only voice-recognition control. Often the spoken word in combination with gazed area is sufficient to resolve the object of interest without any zooming operations. Clearly, the combination of eye tracking and voice-recognition technologies will resolve the object of interest faster than either eye tracking or voice-recognition controls applied exclusively.

Claims (7)

What is claimed is:
1. A method comprising:
determining an area on a display screen at which a user is gazing;
recognizing a spoken word or plurality of spoken words;
associating said spoken word or plurality of spoken words with objects displayed on said display screen;
limiting said objects displayed on said display screen to said area on said screen at which a user is gazing;
associating said objects displayed on said display screen in said area on a screen at which said user is gazing with said spoken word or plurality of spoken words.
2. A method as in claim 1 further comprising:
determining a level of confidence in said associating said objects displayed on said display screen in said area on a screen at which said user is gazing with said spoken word or plurality of spoken words;
comparing said level of confidence with a predetermined level of confidence value and if greater than said predetermined level of confidence value, accepting the association of said spoken word or plurality of spoken words with said objects displayed on said display screen in said area on a screen which said user is gazing.
3. A method as in claim 1 further comprising:
determining said level of confidence value based on the accuracy of the gaze coordinates, the noise of the gaze coordinates, the confidence level in the gaze coordinates, the location of the objects on the screen, or any combination thereof.
4. A method as in claim 1 further comprising:
determining a level of probability in said associating said objects displayed on said display screen in said area on a screen at which said user is gazing with recognizing said spoken word or plurality of spoken words;
comparing said level of probability with a predetermined level of probability value and if greater than said predetermined level of probability value, accepting the association of said spoken word or plurality of spoken words with said objects displayed on said display screen in said area on a screen at which said user is gazing.
5. A method as in claim 4 further comprising:
determining said level of probability based on the confidence level of the voice recognition, the distance from the gaze fixation to each object, the duration of the gaze fixation, the time elapsed between the gaze fixation and the emission of the voice command, or any combination thereof.
6. A method comprising:
determining the objects present in an area on a display screen at which said user is gazing,
building a vocabulary of a voice recognition engine based on said objects,
recognizing a spoken word or plurality of spoken words using said vocabulary;
associating said objects present in the gazed area with said spoken word or plurality of spoken words.
7. A method as in claim 6 further comprising
updating said vocabulary of said voice recognition engine on every fixation of said user.
US15/066,387 2015-03-20 2016-03-10 Method for refining control by combining eye tracking and voice recognition Abandoned US20170262051A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/066,387 US20170262051A1 (en) 2015-03-20 2016-03-10 Method for refining control by combining eye tracking and voice recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562135904P 2015-03-20 2015-03-20
US15/066,387 US20170262051A1 (en) 2015-03-20 2016-03-10 Method for refining control by combining eye tracking and voice recognition

Publications (1)

Publication Number Publication Date
US20170262051A1 true US20170262051A1 (en) 2017-09-14

Family

ID=59787861

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/066,387 Abandoned US20170262051A1 (en) 2015-03-20 2016-03-10 Method for refining control by combining eye tracking and voice recognition

Country Status (5)

Country Link
US (1) US20170262051A1 (en)
EP (1) EP3271803A1 (en)
JP (1) JP2018515817A (en)
KR (1) KR20170129165A (en)
CN (1) CN107567611A (en)

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180286397A1 (en) * 2017-03-29 2018-10-04 Honda Motor Co., Ltd. Object authentication device and object authentication method
US20190124388A1 (en) * 2017-10-24 2019-04-25 Comcast Cable Communications, Llc Determining context to initiate interactivity
US11227599B2 (en) * 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11276402B2 (en) * 2017-05-08 2022-03-15 Cloudminds Robotics Co., Ltd. Method for waking up robot and robot thereof
US11335342B2 (en) * 2020-02-21 2022-05-17 International Business Machines Corporation Voice assistance system
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US11979836B2 (en) 2007-04-03 2024-05-07 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US12001933B2 (en) 2015-05-15 2024-06-04 Apple Inc. Virtual assistant in a communication session
US12014118B2 (en) 2017-05-15 2024-06-18 Apple Inc. Multi-modal interfaces having selection disambiguation and text modification capability
US12021806B1 (en) 2021-09-21 2024-06-25 Apple Inc. Intelligent message delivery
US12026197B2 (en) 2017-05-16 2024-07-02 Apple Inc. Intelligent automated assistant for media exploration
US12051413B2 (en) 2015-09-30 2024-07-30 Apple Inc. Intelligent device identification
US12067985B2 (en) 2018-06-01 2024-08-20 Apple Inc. Virtual assistant operations in multi-device environments
US12165635B2 (en) 2010-01-18 2024-12-10 Apple Inc. Intelligent automated assistant
US12175977B2 (en) 2016-06-10 2024-12-24 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
US12204932B2 (en) 2015-09-08 2025-01-21 Apple Inc. Distributed personal assistant
US12211502B2 (en) 2018-03-26 2025-01-28 Apple Inc. Natural assistant interaction
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
US12236952B2 (en) 2015-03-08 2025-02-25 Apple Inc. Virtual assistant activation
US12254887B2 (en) 2017-05-16 2025-03-18 Apple Inc. Far-field extension of digital assistant services for providing a notification of an event to a user
US12260234B2 (en) 2017-01-09 2025-03-25 Apple Inc. Application integration with a digital assistant
US12301635B2 (en) 2020-05-11 2025-05-13 Apple Inc. Digital assistant hardware abstraction

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803866A (en) * 2018-03-27 2018-11-13 北京七鑫易维信息技术有限公司 The methods, devices and systems of output information
CN108874127A (en) * 2018-05-30 2018-11-23 北京小度信息科技有限公司 Information interacting method, device, electronic equipment and computer readable storage medium
WO2020116001A1 (en) * 2018-12-03 2020-06-11 ソニー株式会社 Information processing device and information processing method
US11978448B2 (en) 2019-02-26 2024-05-07 Lg Electronics Inc. Display device and method of operating the same
US20250085774A1 (en) * 2023-09-08 2025-03-13 Roeland Petrus Hubertus Vertegaal Gaze assisted input for an electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1320848A1 (en) * 2000-09-20 2003-06-25 International Business Machines Corporation Eye gaze for contextual speech recognition
US6718304B1 (en) * 1999-06-30 2004-04-06 Kabushiki Kaisha Toshiba Speech recognition support method and apparatus
US8744645B1 (en) * 2013-02-26 2014-06-03 Honda Motor Co., Ltd. System and method for incorporating gesture and voice recognition into a single system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0651901A (en) * 1992-06-29 1994-02-25 Nri & Ncc Co Ltd Communication device by gaze recognition
JPH08314493A (en) * 1995-05-22 1996-11-29 Sanyo Electric Co Ltd Voice recognition method, numeral line voice recognition device and video recorder system
JP2008058409A (en) * 2006-08-29 2008-03-13 Aisin Aw Co Ltd Speech recognizing method and speech recognizing device
CN103885743A (en) * 2012-12-24 2014-06-25 大陆汽车投资(上海)有限公司 Voice text input method and system combining with gaze tracking technology
KR20140132246A (en) * 2013-05-07 2014-11-17 삼성전자주식회사 Object selection method and object selection apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718304B1 (en) * 1999-06-30 2004-04-06 Kabushiki Kaisha Toshiba Speech recognition support method and apparatus
EP1320848A1 (en) * 2000-09-20 2003-06-25 International Business Machines Corporation Eye gaze for contextual speech recognition
US8744645B1 (en) * 2013-02-26 2014-06-03 Honda Motor Co., Ltd. System and method for incorporating gesture and voice recognition into a single system

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11979836B2 (en) 2007-04-03 2024-05-07 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US12165635B2 (en) 2010-01-18 2024-12-10 Apple Inc. Intelligent automated assistant
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US12009007B2 (en) 2013-02-07 2024-06-11 Apple Inc. Voice trigger for a digital assistant
US12277954B2 (en) 2013-02-07 2025-04-15 Apple Inc. Voice trigger for a digital assistant
US11862186B2 (en) 2013-02-07 2024-01-02 Apple Inc. Voice trigger for a digital assistant
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US12118999B2 (en) 2014-05-30 2024-10-15 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US12067990B2 (en) 2014-05-30 2024-08-20 Apple Inc. Intelligent assistant for home automation
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US12200297B2 (en) 2014-06-30 2025-01-14 Apple Inc. Intelligent automated assistant for TV user interactions
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US12236952B2 (en) 2015-03-08 2025-02-25 Apple Inc. Virtual assistant activation
US12154016B2 (en) 2015-05-15 2024-11-26 Apple Inc. Virtual assistant in a communication session
US12001933B2 (en) 2015-05-15 2024-06-04 Apple Inc. Virtual assistant in a communication session
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US12204932B2 (en) 2015-09-08 2025-01-21 Apple Inc. Distributed personal assistant
US12051413B2 (en) 2015-09-30 2024-07-30 Apple Inc. Intelligent device identification
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
US12175977B2 (en) 2016-06-10 2024-12-24 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
US12293763B2 (en) 2016-06-11 2025-05-06 Apple Inc. Application integration with a digital assistant
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US12260234B2 (en) 2017-01-09 2025-03-25 Apple Inc. Application integration with a digital assistant
US20180286397A1 (en) * 2017-03-29 2018-10-04 Honda Motor Co., Ltd. Object authentication device and object authentication method
US10861452B2 (en) * 2017-03-29 2020-12-08 Honda Motor Co., Ltd. Object authentication device and object authentication method
US11276402B2 (en) * 2017-05-08 2022-03-15 Cloudminds Robotics Co., Ltd. Method for waking up robot and robot thereof
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11862151B2 (en) 2017-05-12 2024-01-02 Apple Inc. Low-latency intelligent automated assistant
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US12014118B2 (en) 2017-05-15 2024-06-18 Apple Inc. Multi-modal interfaces having selection disambiguation and text modification capability
US12026197B2 (en) 2017-05-16 2024-07-02 Apple Inc. Intelligent automated assistant for media exploration
US12254887B2 (en) 2017-05-16 2025-03-18 Apple Inc. Far-field extension of digital assistant services for providing a notification of an event to a user
US20190124388A1 (en) * 2017-10-24 2019-04-25 Comcast Cable Communications, Llc Determining context to initiate interactivity
US11445235B2 (en) * 2017-10-24 2022-09-13 Comcast Cable Communications, Llc Determining context to initiate interactivity
US11792464B2 (en) 2017-10-24 2023-10-17 Comcast Cable Communications, Llc Determining context to initiate interactivity
US12211502B2 (en) 2018-03-26 2025-01-28 Apple Inc. Natural assistant interaction
US11907436B2 (en) 2018-05-07 2024-02-20 Apple Inc. Raise to speak
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
US12061752B2 (en) 2018-06-01 2024-08-13 Apple Inc. Attention aware virtual assistant dismissal
US12067985B2 (en) 2018-06-01 2024-08-20 Apple Inc. Virtual assistant operations in multi-device environments
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US12136419B2 (en) 2019-03-18 2024-11-05 Apple Inc. Multimodality in digital assistant systems
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US12154571B2 (en) 2019-05-06 2024-11-26 Apple Inc. Spoken notifications
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US12216894B2 (en) 2019-05-06 2025-02-04 Apple Inc. User configurable task triggers
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11227599B2 (en) * 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11468890B2 (en) 2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11335342B2 (en) * 2020-02-21 2022-05-17 International Business Machines Corporation Voice assistance system
US12197712B2 (en) 2020-05-11 2025-01-14 Apple Inc. Providing relevant data items based on context
US12301635B2 (en) 2020-05-11 2025-05-13 Apple Inc. Digital assistant hardware abstraction
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11750962B2 (en) 2020-07-21 2023-09-05 Apple Inc. User identification using headphones
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US12219314B2 (en) 2020-07-21 2025-02-04 Apple Inc. User identification using headphones
US12021806B1 (en) 2021-09-21 2024-06-25 Apple Inc. Intelligent message delivery

Also Published As

Publication number Publication date
CN107567611A (en) 2018-01-09
JP2018515817A (en) 2018-06-14
KR20170129165A (en) 2017-11-24
EP3271803A1 (en) 2018-01-24

Similar Documents

Publication Publication Date Title
US20170262051A1 (en) Method for refining control by combining eye tracking and voice recognition
US10838513B2 (en) Responding to selection of a displayed character string
US9223590B2 (en) System and method for issuing commands to applications based on contextual information
US8922489B2 (en) Text input using key and gesture information
JP4527731B2 (en) Virtual keyboard system with automatic correction function
US8327282B2 (en) Extended keyboard user interface
US10275152B2 (en) Advanced methods and systems for text input error correction
US9753906B2 (en) Character string replacement
US9043300B2 (en) Input method editor integration
US20140078065A1 (en) Predictive Keyboard With Suppressed Keys
US20130007606A1 (en) Text deletion
US20100287486A1 (en) Correction of typographical errors on touch displays
EP2713255A1 (en) Method and electronic device for prompting character input
US20150186347A1 (en) Information processing device, information processing method, and computer program product
US9910589B2 (en) Virtual keyboard with adaptive character recognition zones
KR20180101723A (en) Semantic zoom animations
WO2014058948A1 (en) A split virtual keyboard on a mobile computing device
US20140068509A1 (en) Managing a Selection Mode for Presented Content
US20140304640A1 (en) Techniques for input of a multi-character compound consonant or vowel and transliteration to another language using a touch computing device
EP2306287A2 (en) Apparatus and method for displaying input character indicator
US11112965B2 (en) Advanced methods and systems for text input error correction
US20150205781A1 (en) Systems and methods for using tone indicator in text recognition
WO2016151396A1 (en) Method for refining control by combining eye tracking and voice recognition
US9778839B2 (en) Motion-based input method and system for electronic device
US20180300021A1 (en) Text input system with correction facility

Legal Events

Date Code Title Description
AS Assignment

Owner name: THE EYE TRIBE, DENMARK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TALL, MARTIN HENRIK;PRIESUM, JONAS;SAN AGUSTIN, JAVIER;REEL/FRAME:037948/0234

Effective date: 20160310

AS Assignment

Owner name: THE EYE TRIBE APS, DENMARK

Free format text: AFFIDAVIT OF ASSIGNEE TO CORRECT THE ASSIGNEE'S NAME RECORDED AT REEL/FRAME: 037948/0234;ASSIGNOR:THE EYE TRIBE;REEL/FRAME:040912/0950

Effective date: 20161214

AS Assignment

Owner name: FACEBOOK, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THE EYE TRIBE APS;REEL/FRAME:041291/0471

Effective date: 20170216

AS Assignment

Owner name: THE EYE TRIBE APS, DENMARK

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNORS SIGNATURES AND DATES PREVIOUSLY RECORDED AT REEL: 037948 FRAME: 0234. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:TALL, MARTIN HENRIK;PRIESUM, JONAS PHILIP;SAN AGUSTIN, JAVIER;SIGNING DATES FROM 20161213 TO 20161214;REEL/FRAME:042793/0300

AS Assignment

Owner name: THE EYE TRIBE, DENMARK

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE SPELLING OF THE THIRD ASSIGNOR'S NAME PREVIOUSLY RECORDED ON REEL 042793 FRAME 0300. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:TALL, MARTIN HENRIK;PRIESUM, JONAS;SAN AGUSTIN LOPEZ, JAVIER;SIGNING DATES FROM 20161213 TO 20161214;REEL/FRAME:046426/0545

AS Assignment

Owner name: FACEBOOK TECHNOLOGIES, LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FACEBOOK, INC.;REEL/FRAME:047687/0942

Effective date: 20181024

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: META PLATFORMS TECHNOLOGIES, LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:FACEBOOK TECHNOLOGIES, LLC;REEL/FRAME:062749/0697

Effective date: 20220318