US20170262051A1 - Method for refining control by combining eye tracking and voice recognition - Google Patents
Method for refining control by combining eye tracking and voice recognition Download PDFInfo
- Publication number
- US20170262051A1 US20170262051A1 US15/066,387 US201615066387A US2017262051A1 US 20170262051 A1 US20170262051 A1 US 20170262051A1 US 201615066387 A US201615066387 A US 201615066387A US 2017262051 A1 US2017262051 A1 US 2017262051A1
- Authority
- US
- United States
- Prior art keywords
- screen
- area
- objects
- user
- eye tracking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000007670 refining Methods 0.000 title 1
- 238000005516 engineering process Methods 0.000 abstract description 16
- 230000002452 interceptive effect Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 241000577979 Peromyscus spicilegus Species 0.000 description 2
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
- G06F3/04842—Selection of displayed objects or displayed text elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- the present invention relates to a system control using eye tracking and voice recognition
- GUIs graphical user interfaces
- Objects which may include images, words, and alphanumeric characters can be displayed on screens; and users employ cursor-control devices (e.g. mouse or touch pad) and switches to indicate choice and selection of interactive screen elements.
- cursor-control devices e.g. mouse or touch pad
- switches to indicate choice and selection of interactive screen elements.
- systems may use a touch-sensitive screen whereby a user identifies and selects something by touching its screen location with a finger or stylus. In this way, for example, one could select a control icon, such as “print,” or select a hyperlink.
- Cursor control and touch-control panels are designed such that users physically manipulate a control device to locate and select screen items.
- There are alternative means for such control that do not involve physically moving or touching a control subsystem.
- One such alternative makes use of eye tracking where a user's gaze at a screen can be employed to identify a screen area of interest and a screen item for interactive selection.
- Another alternative makes use of voice recognition and associates recognized words with related items displayed on a screen.
- eye tracking nor voice recognition control, on their own, are as precise with regard to locating and selecting screen objects as, say, cursor control or touch control. In the case of eye tracking, one is often limited in resolution to a screen area rather than a point or small cluster of points.
- control methodologies may employ zooming so as to limit the number of screen objects and increase the distance between them, as in eye tracking control; or require iterative spoken commands in order to increase the probability of correct control or selection interpretation.
- the method herein disclosed and claimed enables independently implemented eye tracking and voice recognition controls to co-operate so as to make overall control faster and/or more accurate.
- the method herein disclosed and claimed could be employed in an integrated control system that combines eye tracking with voice recognition control.
- the method herein disclosed and claimed is applicable to locating and selecting screen objects that may result from booting up a system in preparation for running an application, or interacting with a server-based HTML page aggregate using a client user system (e.g. interacting with a website via the Internet).
- this method in conjunction with eye tracking and voice recognition control subsystems would provide enhanced control over the interaction of screen-displayed objects irrespective of the underlying platform specifics.
- the method herein disclosed and claimed uses attributes of eye tracking to reduce the ambiguities of voice-recognition control; and uses voice recognition to reduce the ambiguities of eye tracking control.
- the result is control synergy; that is, control speed and accuracy that exceeds that of eye tracking or voice recognition control on each's own.
- FIG. 1 depicts a display screen displaying non-text and textual objects.
- the screen for example, could be any system display and control screen, such as a computer monitor, smartphone screen, tablet screen, or the like.
- FIG. 2 depicts the screen of FIG. 1 where eye tracking control determines that the user's gaze is essentially on a non-textual object.
- FIG. 3 depicts the screen of FIG. 1 where eye tracking control determines that the user's gaze is essentially on a screen area comprising text objects.
- FIG. 4 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the confidence level of determining a location and selection, and, therefore, the accuracy.
- FIG. 5 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining a location and selection, and, therefore, the accuracy.
- FIG. 6 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining the selected word in a group of words by associating the interpreted word with its occurrence in a smaller screen area determined as the user's gaze screen area.
- GUIs have become the primary interaction mechanism between systems and users.
- displayed objects on a screen which could be images, alphanumeric characters, text, icons, and the like
- the user makes use of a portion of the GUI that enables the user to locate and select a screen object.
- the two most common GUI subsystems employ cursor control devices (e.g. mouse or touch pad) and selection switches to locate and select screen objects.
- the screen object could be a control icon, like a print button, so locating and selecting it may cause a displayed document file to be printed. If the screen object is a letter, word, or highlighted text portion, the selection would make it available for editing, deletion, copy-and-paste, or similar operations.
- Today many devices use a touch-panel screen which enables a finger or stylus touch to locate and/or select a screen object. In both cases, the control relies on the user to physically engage with a control device in order to locate and select a screen object.
- cursor control With cursor control, one is usually able to precisely locate and select a screen object. Sometimes one has to enlarge a portion of the screen to make objects larger and move them farther apart from one another in order to precisely locate and select an intended screen object. This zooming function is more typical of finger-touch controls where a finger touch on an area with several small screen objects is imprecise until zooming is applied.
- a GUI could also serve to enable location and selection of screen objects without requiring physical engagement.
- a GUI that makes use of eye tracking control would determine where on a screen the user is gazing (e.g. location) and use some method for selection control (e.g. gaze dwell time). This would be analogous to using a mouse to move a cursor over a screen object and then pressing a button to signify selection intent.
- Voice-recognition-based control could also serve as a control technology where physical engagement would not be required.
- a screen of objects would have a vocabulary of spoken words associated with the objects, and when a user says a word or phrase, the control system recognizes the word and associates it with a particular screen object. So, for example, a screen with an object that is a circle with a letter A in its center could be located and selected by a user who says “circle A,” which may cause the GUI system to highlight it, and then saying “select,” which would cause the GUI system to select the object and perhaps remove the highlighting. Clearly, if there were many objects on a screen, some having the same description, saying “circle” where there are five circles of various size and color would be ambiguous. The system could prompt the user for further delineation in order to have a higher confidence level or higher probability estimation.
- the tradeoff in using eye tracking or voice-recognition control is eliminating the need for physical engagement with a pointing/selecting device or the screen, but accepting less precise location and selection resolution.
- a type-selecting cursor is smaller than an alphanumeric character standing alone or immersed in a word. So, if one is fixing a typographical error, one can select a single letter and delete or change it. Using touch control, the area of finger or stylus touch is typically larger than a cursor pointer. It would be difficult to select a letter immersed in a word for similar typographical error correction. One may have to make several pointing attempts to select the correct letter, or expand (i.e. zoom) the word to larger proportions so that the touch point can be resolved to the single, intended letter target.
- font sizes and non-textual object dimensions will affect the control resolution, but in general, technologies that do not require physical engagement cannot accommodate dense text having small characters and non-text objects having small dimensions without iterative zooming steps.
- the method herein disclosed and claimed makes use of eye tracking and voice-recognition control technologies in conjunction to, in effect, improve the accuracy of locating and selecting screen objects using either control technology on its own.
- the method applies to any system having displayed objects whereby users interact with said system by locating and selecting screen objects and directing the system to carry out some operation or operations on one or a plurality of screen objects.
- Such systems can comprise combinations of hardware, firmware and software that, in concert, support displaying, locating, selecting and operating on displayed objects.
- the method may comprise interacting with system hardware and/or software as part of an integrated control subsystem incorporating eye tracking and voice-recognitions controls; or as part of a system in which separate eye tracking and voice-recognition control subsystems can interact.
- the method invention herein disclosed and claimed should therefore not be limited in scope to any particular system architecture or parsing of hardware and software.
- Eye tracking technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of determining approximately where a user's eye or eyes are gazing at some area of a display screen.
- the eye tracking technology or subsystem may also be capable of determining that a user has selected one or more objects in the gazed area so located.
- An object could be an icon or link that initiates an operation if so selected.
- Voice-recognition technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of recognizing a user's spoken word or phrase of words and associating that recognized word or phrase with a displayed object and/or an operational command.
- FIG. 1 depicts a display of objects on a screen.
- Objects consist of text objects, such as alphanumeric characters, words, sentences and paragraphs; and non-text objects which comprise images, line art, icons, and the like. This drawing is exemplary and should not be read as limiting the layout and content of objects on a screen.
- an eye tracking control subsystem has determined that a user's eye is gazing at a portion of a non-text object and the gazed area is defined by the area circled by 201 .
- FIG. 3 depicts the screen of FIG. 1 where an eye tracking control subsystem has determined that a user's eye is gazing at a portion of text objects, the area of which is circled by 301 .
- the eye tracking subsystem could not, at that time, resolve which object in area 201 is a user's object of interest.
- the screen objects could be enlarged such that only one object would be located in area 201 .
- the subsequent step adds time for the sake of accuracy. It may also be the case that a first zooming attempt results in two or more objects still within area 201 . Hence, a second zoom operation may have to be done in order to determine the object of interest. Here, again, more time is used.
- the gazed area, 301 covers a plurality of alphanumeric characters and words.
- the eye tracking control subsystem would be unable to determine specifically which character or word is the object of interest.
- iterative zoom operations may have to be done in order to resolve which letter or word is the object of interest. As with the non-text object case, each time a zoom operation is applied, more time is required.
- the entire visible screen and any of its objects could be a user's object of choice.
- the voice-recognition subsystem would first have to recognize the word “here,” then associate it with any instances of it among the screen objects. As shown in FIG. 1 , there are three instances of the word “here.” Thus, the voice-recognition subsystem would be unable to resolve the command to a singular object choice. It may have to engage in a repetitive sequence of highlighting each instance of “here” in turn until the user says “yes,” for example. This would take more time.
- FIG. 4 shows an exemplary task flow.
- the flow shown in FIG. 4 should not be read as limiting.
- the flow begins 401 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly.
- the eye tracking subsystem computes repeated screen gaze coordinates and passes them to the system. From 402 , a gazed area, G, is determined ( 403 ).
- 404 and 405 once area G is determined, the system builds a dictionary of links, D, and vocabulary, V, for those found links in area G.
- vocabulary V may be updated for every gaze coordinate, for every fixation, every N gaze coordinates, every T milliseconds, and so on. Steps 402 through 405 continue to refresh until a voice command is received ( 406 ). The system then recognizes the voice command based on vocabulary, V ( 407 ) and determines link L along with a confidence level of accuracy, C ( 408 ). With voice recognition, extraneous sounds coupled with a voice command, can also introduce audio artifacts that may reduce recognition accuracy.
- the confidence level C may be compared to a threshold value, th, and if it is greater ( 409 ), then the system activates link L ( 410 ), otherwise it returns to operation ( 402 ).
- the threshold th may take a fixed value, or it may be computed on a per-case basis depending on different factors, for example, noise in the gaze coordinates, on-screen accuracy reported by the eye tracking system, confidence level in the gaze coordinates, location of the link L on the screen, or any combination of these.
- the system can activate the link, L, with sufficient level of confidence using fewer steps and in less time.
- FIG. 5 shows an exemplary task flow.
- the flow in FIG. 5 should not be read as limiting.
- the flow begins with 501 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly.
- the eye control subsystem repeatedly refreshes the gazed area coordinates and feeds that data to the system ( 502 ).
- a gazed area G is determined by the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that ( 504 ).
- a dictionary of links, D, present in area G is built ( 505 ) and a vocabulary, V, of links in the area G is built ( 506 ).
- the voice command is recognized based on V ( 507 ) with probability P.
- the accuracy probability P for each link may be computed ( 508 ) based on different factors, for example, the confidence level of the voice recognition C, the distance from the gaze point or a fixation to the link, the duration of said fixation, time elapsed between link being gazed upon and emission of the voice command, and the like; and the link with highest probability P may be selected. If P is larger than a threshold value ( 509 ), th, then the link, L, is activated ( 510 ), otherwise the system returns to operation ( 502 ) and waits for a new voice command.
- the threshold value th may take a fixed value, or it may be computed on a per-case basis as explained above for operation ( 409 ). Note that in both FIGS. 4 and 5 a link is activated. In fact, these operations are not limited to links, but rather, could be applied to any interactive screen object.
- FIG. 6 shows an exemplary task flow.
- the flow in FIG. 6 should not be read as limiting.
- the flow begins with the system loading and parsing the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly.
- the system awaits a voice command.
- the command is “select” ( 603 ).
- a gazed area, G is determined ( 604 ) by using the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that.
- the gazed area is as in FIG. 3 over text objects. So, the text is parsed, T, in area G and a vocabulary, V, is built ( 605 ).
- the text object of the voice command is recognized ( 606 ).
- a word, W is evaluated as to probability, P, ( 607 ) and compared to a threshold value ( 608 ), th. If P exceeds th, word W is selected ( 609 ).
- Probability P and threshold value th may be computed as explained previously.
- FIGS. 4, 5 and 6 are exemplary.
- the entire screen of objects is reduced to those objects within a gazed area increasing the confidence or probability level without resorting to zooming operations. It is of course possible that a gazed area will still continue to have some object of interest ambiguity, but the likelihood is far lower than with using only voice-recognition control. Often the spoken word in combination with gazed area is sufficient to resolve the object of interest without any zooming operations.
- the combination of eye tracking and voice-recognition technologies will resolve the object of interest faster than either eye tracking or voice-recognition controls applied exclusively.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- User Interface Of Digital Computer (AREA)
- Eye Examination Apparatus (AREA)
Abstract
Description
- The present invention relates to a system control using eye tracking and voice recognition
- Computing devices, such as personal computers, smartphones, tablets, and others make use of graphical user interfaces (GUIs) to facilitate control by their users. Objects which may include images, words, and alphanumeric characters can be displayed on screens; and users employ cursor-control devices (e.g. mouse or touch pad) and switches to indicate choice and selection of interactive screen elements. In other cases, rather than cursor and switch, systems may use a touch-sensitive screen whereby a user identifies and selects something by touching its screen location with a finger or stylus. In this way, for example, one could select a control icon, such as “print,” or select a hyperlink. One could also select a sequence of alphanumeric characters or words for text editing and/or copy-and-paste interactions. Cursor control and touch-control panels are designed such that users physically manipulate a control device to locate and select screen items. There are alternative means for such control, however, that do not involve physically moving or touching a control subsystem. One such alternative makes use of eye tracking where a user's gaze at a screen can be employed to identify a screen area of interest and a screen item for interactive selection. Another alternative makes use of voice recognition and associates recognized words with related items displayed on a screen. Neither eye tracking nor voice recognition control, on their own, are as precise with regard to locating and selecting screen objects as, say, cursor control or touch control. In the case of eye tracking, one is often limited in resolution to a screen area rather than a point or small cluster of points. If there is more than one screen object within or near that screen area, then selection may be ambiguous. Similarly, with a screen full of text and object choices, a voice recognition subsystem could also suffer ambiguity when trying to resolve a recognized word with a singularly related screen object or word. Thus, as a result, such control methodologies may employ zooming so as to limit the number of screen objects and increase the distance between them, as in eye tracking control; or require iterative spoken commands in order to increase the probability of correct control or selection interpretation.
- By combining eye tracking and voice recognition controls one can effectively increase the accuracy of location and selection and thereby reduce iterative zooming or spoken commands that are currently required when using one or the other control technology.
- The method herein disclosed and claimed enables independently implemented eye tracking and voice recognition controls to co-operate so as to make overall control faster and/or more accurate.
- The method herein disclosed and claimed could be employed in an integrated control system that combines eye tracking with voice recognition control.
- The method herein disclosed and claimed is applicable to locating and selecting screen objects that may result from booting up a system in preparation for running an application, or interacting with a server-based HTML page aggregate using a client user system (e.g. interacting with a website via the Internet). In essence, this method in conjunction with eye tracking and voice recognition control subsystems would provide enhanced control over the interaction of screen-displayed objects irrespective of the underlying platform specifics.
- The method herein disclosed and claimed uses attributes of eye tracking to reduce the ambiguities of voice-recognition control; and uses voice recognition to reduce the ambiguities of eye tracking control. The result is control synergy; that is, control speed and accuracy that exceeds that of eye tracking or voice recognition control on each's own.
-
FIG. 1 depicts a display screen displaying non-text and textual objects. The screen, for example, could be any system display and control screen, such as a computer monitor, smartphone screen, tablet screen, or the like. -
FIG. 2 depicts the screen ofFIG. 1 where eye tracking control determines that the user's gaze is essentially on a non-textual object. -
FIG. 3 depicts the screen ofFIG. 1 where eye tracking control determines that the user's gaze is essentially on a screen area comprising text objects. -
FIG. 4 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the confidence level of determining a location and selection, and, therefore, the accuracy. -
FIG. 5 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining a location and selection, and, therefore, the accuracy. -
FIG. 6 depicts an exemplary flow chart illustrating how combining eye tracking and voice recognition would increase the probability level of determining the selected word in a group of words by associating the interpreted word with its occurrence in a smaller screen area determined as the user's gaze screen area. - As interactive computing systems of all kinds have evolved, GUIs have become the primary interaction mechanism between systems and users. With displayed objects on a screen, which could be images, alphanumeric characters, text, icons, and the like, the user makes use of a portion of the GUI that enables the user to locate and select a screen object. The two most common GUI subsystems employ cursor control devices (e.g. mouse or touch pad) and selection switches to locate and select screen objects. The screen object could be a control icon, like a print button, so locating and selecting it may cause a displayed document file to be printed. If the screen object is a letter, word, or highlighted text portion, the selection would make it available for editing, deletion, copy-and-paste, or similar operations. Today many devices use a touch-panel screen which enables a finger or stylus touch to locate and/or select a screen object. In both cases, the control relies on the user to physically engage with a control device in order to locate and select a screen object.
- With cursor control, one is usually able to precisely locate and select a screen object. Sometimes one has to enlarge a portion of the screen to make objects larger and move them farther apart from one another in order to precisely locate and select an intended screen object. This zooming function is more typical of finger-touch controls where a finger touch on an area with several small screen objects is imprecise until zooming is applied.
- A GUI could also serve to enable location and selection of screen objects without requiring physical engagement. For example, a GUI that makes use of eye tracking control would determine where on a screen the user is gazing (e.g. location) and use some method for selection control (e.g. gaze dwell time). This would be analogous to using a mouse to move a cursor over a screen object and then pressing a button to signify selection intent.
- Voice-recognition-based control could also serve as a control technology where physical engagement would not be required. A screen of objects would have a vocabulary of spoken words associated with the objects, and when a user says a word or phrase, the control system recognizes the word and associates it with a particular screen object. So, for example, a screen with an object that is a circle with a letter A in its center could be located and selected by a user who says “circle A,” which may cause the GUI system to highlight it, and then saying “select,” which would cause the GUI system to select the object and perhaps remove the highlighting. Clearly, if there were many objects on a screen, some having the same description, saying “circle” where there are five circles of various size and color would be ambiguous. The system could prompt the user for further delineation in order to have a higher confidence level or higher probability estimation.
- Thus, the tradeoff in using eye tracking or voice-recognition control is eliminating the need for physical engagement with a pointing/selecting device or the screen, but accepting less precise location and selection resolution. Often, as a result of the lower resolution, there may be more steps performed before the system can determine the location and selection of an object with a probability commensurate with more resolute controls, such as cursor, touch pad, or touch screen.
- Typically, a type-selecting cursor is smaller than an alphanumeric character standing alone or immersed in a word. So, if one is fixing a typographical error, one can select a single letter and delete or change it. Using touch control, the area of finger or stylus touch is typically larger than a cursor pointer. It would be difficult to select a letter immersed in a word for similar typographical error correction. One may have to make several pointing attempts to select the correct letter, or expand (i.e. zoom) the word to larger proportions so that the touch point can be resolved to the single, intended letter target.
- Regardless of which GUI location and selection technology one uses, font sizes and non-textual object dimensions will affect the control resolution, but in general, technologies that do not require physical engagement cannot accommodate dense text having small characters and non-text objects having small dimensions without iterative zooming steps.
- The method herein disclosed and claimed makes use of eye tracking and voice-recognition control technologies in conjunction to, in effect, improve the accuracy of locating and selecting screen objects using either control technology on its own. The method applies to any system having displayed objects whereby users interact with said system by locating and selecting screen objects and directing the system to carry out some operation or operations on one or a plurality of screen objects. Such systems can comprise combinations of hardware, firmware and software that, in concert, support displaying, locating, selecting and operating on displayed objects. The method may comprise interacting with system hardware and/or software as part of an integrated control subsystem incorporating eye tracking and voice-recognitions controls; or as part of a system in which separate eye tracking and voice-recognition control subsystems can interact. The method invention herein disclosed and claimed should therefore not be limited in scope to any particular system architecture or parsing of hardware and software.
- Eye tracking technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of determining approximately where a user's eye or eyes are gazing at some area of a display screen. The eye tracking technology or subsystem may also be capable of determining that a user has selected one or more objects in the gazed area so located. An object could be an icon or link that initiates an operation if so selected.
- Voice-recognition technology or subsystem refers to any such technology or subsystem, regardless of architecture or implementation, which is capable of recognizing a user's spoken word or phrase of words and associating that recognized word or phrase with a displayed object and/or an operational command.
-
FIG. 1 depicts a display of objects on a screen. Objects consist of text objects, such as alphanumeric characters, words, sentences and paragraphs; and non-text objects which comprise images, line art, icons, and the like. This drawing is exemplary and should not be read as limiting the layout and content of objects on a screen. - With eye tracking control technology one can determine an area where a user's eye or eyes are gazing at the screen of
FIG. 1 . For example, inFIG. 2 , an eye tracking control subsystem has determined that a user's eye is gazing at a portion of a non-text object and the gazed area is defined by the area circled by 201. -
FIG. 3 depicts the screen ofFIG. 1 where an eye tracking control subsystem has determined that a user's eye is gazing at a portion of text objects, the area of which is circled by 301. - In
FIG. 2 , if the non-text object were smaller than 201, and more than one such object were located inarea 201, the eye tracking subsystem could not, at that time, resolve which object inarea 201 is a user's object of interest. By engaging in a subsequent step, the screen objects could be enlarged such that only one object would be located inarea 201. But the subsequent step adds time for the sake of accuracy. It may also be the case that a first zooming attempt results in two or more objects still withinarea 201. Hence, a second zoom operation may have to be done in order to determine the object of interest. Here, again, more time is used. - In
FIG. 3 , the gazed area, 301, covers a plurality of alphanumeric characters and words. Here, again, the eye tracking control subsystem would be unable to determine specifically which character or word is the object of interest. Again, iterative zoom operations may have to be done in order to resolve which letter or word is the object of interest. As with the non-text object case, each time a zoom operation is applied, more time is required. - Using a voice-recognition technology in association with
FIG. 1 , the entire visible screen and any of its objects could be a user's object of choice. For example, if the user said “delete word ‘here’”, the voice-recognition subsystem would first have to recognize the word “here,” then associate it with any instances of it among the screen objects. As shown inFIG. 1 , there are three instances of the word “here.” Thus, the voice-recognition subsystem would be unable to resolve the command to a singular object choice. It may have to engage in a repetitive sequence of highlighting each instance of “here” in turn until the user says “yes,” for example. This would take more time. - In one embodiment of the invention herein disclosed and claimed,
FIG. 4 shows an exemplary task flow. The flow shown inFIG. 4 should not be read as limiting. The flow begins 401 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly. In 402, the eye tracking subsystem computes repeated screen gaze coordinates and passes them to the system. From 402, a gazed area, G, is determined (403). In 404 and 405, once area G is determined, the system builds a dictionary of links, D, and vocabulary, V, for those found links in area G. Depending on the capabilities of the computing device and/or the voice recognition subsystem, vocabulary V may be updated for every gaze coordinate, for every fixation, every N gaze coordinates, every T milliseconds, and so on.Steps 402 through 405 continue to refresh until a voice command is received (406). The system then recognizes the voice command based on vocabulary, V (407) and determines link L along with a confidence level of accuracy, C (408). With voice recognition, extraneous sounds coupled with a voice command, can also introduce audio artifacts that may reduce recognition accuracy. In order to avoid incorrect selections due to extraneous sounds, the confidence level C may be compared to a threshold value, th, and if it is greater (409), then the system activates link L (410), otherwise it returns to operation (402). The threshold th may take a fixed value, or it may be computed on a per-case basis depending on different factors, for example, noise in the gaze coordinates, on-screen accuracy reported by the eye tracking system, confidence level in the gaze coordinates, location of the link L on the screen, or any combination of these. Here is a case where eye tracking technology is used to reduce the whole screen of possible objects to just those within the gazed area, G. Rather than having to iterate with repeated zoom steps, by using the eye tracking gazed area G as a delineator, the system can activate the link, L, with sufficient level of confidence using fewer steps and in less time. - In another embodiment,
FIG. 5 shows an exemplary task flow. The flow inFIG. 5 should not be read as limiting. The flow begins with 501 where a system loads and parses the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly. The eye control subsystem repeatedly refreshes the gazed area coordinates and feeds that data to the system (502). When a voice command is received (503), a gazed area G is determined by the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that (504). A dictionary of links, D, present in area G is built (505) and a vocabulary, V, of links in the area G is built (506). The voice command is recognized based on V (507) with probability P. In case multiple links are recognized, the accuracy probability P for each link may be computed (508) based on different factors, for example, the confidence level of the voice recognition C, the distance from the gaze point or a fixation to the link, the duration of said fixation, time elapsed between link being gazed upon and emission of the voice command, and the like; and the link with highest probability P may be selected. If P is larger than a threshold value (509), th, then the link, L, is activated (510), otherwise the system returns to operation (502) and waits for a new voice command. The threshold value th may take a fixed value, or it may be computed on a per-case basis as explained above for operation (409). Note that in bothFIGS. 4 and 5 a link is activated. In fact, these operations are not limited to links, but rather, could be applied to any interactive screen object. - In another embodiment,
FIG. 6 shows an exemplary task flow. The flow inFIG. 6 should not be read as limiting. The flow begins with the system loading and parsing the elements that will comprise the screen objects. Although not shown in the flow chart, this operation may be done repeatedly. Then, the system awaits a voice command. Here, for example, the command is “select” (603). A gazed area, G, is determined (604) by using the eye tracking coordinates received during a time window that may extend from the time the command is received to some predetermined number of seconds before that. Here, the gazed area is as inFIG. 3 over text objects. So, the text is parsed, T, in area G and a vocabulary, V, is built (605). Based on vocabulary, V, the text object of the voice command is recognized (606). A word, W, is evaluated as to probability, P, (607) and compared to a threshold value (608), th. If P exceeds th, word W is selected (609). Probability P and threshold value th may be computed as explained previously. - The flows shown in
FIGS. 4, 5 and 6 are exemplary. In each example, the entire screen of objects is reduced to those objects within a gazed area increasing the confidence or probability level without resorting to zooming operations. It is of course possible that a gazed area will still continue to have some object of interest ambiguity, but the likelihood is far lower than with using only voice-recognition control. Often the spoken word in combination with gazed area is sufficient to resolve the object of interest without any zooming operations. Clearly, the combination of eye tracking and voice-recognition technologies will resolve the object of interest faster than either eye tracking or voice-recognition controls applied exclusively.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/066,387 US20170262051A1 (en) | 2015-03-20 | 2016-03-10 | Method for refining control by combining eye tracking and voice recognition |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562135904P | 2015-03-20 | 2015-03-20 | |
US15/066,387 US20170262051A1 (en) | 2015-03-20 | 2016-03-10 | Method for refining control by combining eye tracking and voice recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170262051A1 true US20170262051A1 (en) | 2017-09-14 |
Family
ID=59787861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/066,387 Abandoned US20170262051A1 (en) | 2015-03-20 | 2016-03-10 | Method for refining control by combining eye tracking and voice recognition |
Country Status (5)
Country | Link |
---|---|
US (1) | US20170262051A1 (en) |
EP (1) | EP3271803A1 (en) |
JP (1) | JP2018515817A (en) |
KR (1) | KR20170129165A (en) |
CN (1) | CN107567611A (en) |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180286397A1 (en) * | 2017-03-29 | 2018-10-04 | Honda Motor Co., Ltd. | Object authentication device and object authentication method |
US20190124388A1 (en) * | 2017-10-24 | 2019-04-25 | Comcast Cable Communications, Llc | Determining context to initiate interactivity |
US11227599B2 (en) * | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11276402B2 (en) * | 2017-05-08 | 2022-03-15 | Cloudminds Robotics Co., Ltd. | Method for waking up robot and robot thereof |
US11335342B2 (en) * | 2020-02-21 | 2022-05-17 | International Business Machines Corporation | Voice assistance system |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US11979836B2 (en) | 2007-04-03 | 2024-05-07 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12021806B1 (en) | 2021-09-21 | 2024-06-25 | Apple Inc. | Intelligent message delivery |
US12026197B2 (en) | 2017-05-16 | 2024-07-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US12165635B2 (en) | 2010-01-18 | 2024-12-10 | Apple Inc. | Intelligent automated assistant |
US12175977B2 (en) | 2016-06-10 | 2024-12-24 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US12204932B2 (en) | 2015-09-08 | 2025-01-21 | Apple Inc. | Distributed personal assistant |
US12211502B2 (en) | 2018-03-26 | 2025-01-28 | Apple Inc. | Natural assistant interaction |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US12236952B2 (en) | 2015-03-08 | 2025-02-25 | Apple Inc. | Virtual assistant activation |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US12260234B2 (en) | 2017-01-09 | 2025-03-25 | Apple Inc. | Application integration with a digital assistant |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108803866A (en) * | 2018-03-27 | 2018-11-13 | 北京七鑫易维信息技术有限公司 | The methods, devices and systems of output information |
CN108874127A (en) * | 2018-05-30 | 2018-11-23 | 北京小度信息科技有限公司 | Information interacting method, device, electronic equipment and computer readable storage medium |
WO2020116001A1 (en) * | 2018-12-03 | 2020-06-11 | ソニー株式会社 | Information processing device and information processing method |
US11978448B2 (en) | 2019-02-26 | 2024-05-07 | Lg Electronics Inc. | Display device and method of operating the same |
US20250085774A1 (en) * | 2023-09-08 | 2025-03-13 | Roeland Petrus Hubertus Vertegaal | Gaze assisted input for an electronic device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1320848A1 (en) * | 2000-09-20 | 2003-06-25 | International Business Machines Corporation | Eye gaze for contextual speech recognition |
US6718304B1 (en) * | 1999-06-30 | 2004-04-06 | Kabushiki Kaisha Toshiba | Speech recognition support method and apparatus |
US8744645B1 (en) * | 2013-02-26 | 2014-06-03 | Honda Motor Co., Ltd. | System and method for incorporating gesture and voice recognition into a single system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0651901A (en) * | 1992-06-29 | 1994-02-25 | Nri & Ncc Co Ltd | Communication device by gaze recognition |
JPH08314493A (en) * | 1995-05-22 | 1996-11-29 | Sanyo Electric Co Ltd | Voice recognition method, numeral line voice recognition device and video recorder system |
JP2008058409A (en) * | 2006-08-29 | 2008-03-13 | Aisin Aw Co Ltd | Speech recognizing method and speech recognizing device |
CN103885743A (en) * | 2012-12-24 | 2014-06-25 | 大陆汽车投资(上海)有限公司 | Voice text input method and system combining with gaze tracking technology |
KR20140132246A (en) * | 2013-05-07 | 2014-11-17 | 삼성전자주식회사 | Object selection method and object selection apparatus |
-
2016
- 2016-03-10 US US15/066,387 patent/US20170262051A1/en not_active Abandoned
- 2016-03-15 KR KR1020177027275A patent/KR20170129165A/en not_active Ceased
- 2016-03-15 EP EP16720164.9A patent/EP3271803A1/en not_active Withdrawn
- 2016-03-15 JP JP2017567559A patent/JP2018515817A/en active Pending
- 2016-03-15 CN CN201680025224.5A patent/CN107567611A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6718304B1 (en) * | 1999-06-30 | 2004-04-06 | Kabushiki Kaisha Toshiba | Speech recognition support method and apparatus |
EP1320848A1 (en) * | 2000-09-20 | 2003-06-25 | International Business Machines Corporation | Eye gaze for contextual speech recognition |
US8744645B1 (en) * | 2013-02-26 | 2014-06-03 | Honda Motor Co., Ltd. | System and method for incorporating gesture and voice recognition into a single system |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11979836B2 (en) | 2007-04-03 | 2024-05-07 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US12165635B2 (en) | 2010-01-18 | 2024-12-10 | Apple Inc. | Intelligent automated assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US12009007B2 (en) | 2013-02-07 | 2024-06-11 | Apple Inc. | Voice trigger for a digital assistant |
US12277954B2 (en) | 2013-02-07 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US12118999B2 (en) | 2014-05-30 | 2024-10-15 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US12067990B2 (en) | 2014-05-30 | 2024-08-20 | Apple Inc. | Intelligent assistant for home automation |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US12200297B2 (en) | 2014-06-30 | 2025-01-14 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US12236952B2 (en) | 2015-03-08 | 2025-02-25 | Apple Inc. | Virtual assistant activation |
US12154016B2 (en) | 2015-05-15 | 2024-11-26 | Apple Inc. | Virtual assistant in a communication session |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US12204932B2 (en) | 2015-09-08 | 2025-01-21 | Apple Inc. | Distributed personal assistant |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US12175977B2 (en) | 2016-06-10 | 2024-12-24 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US12293763B2 (en) | 2016-06-11 | 2025-05-06 | Apple Inc. | Application integration with a digital assistant |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US12260234B2 (en) | 2017-01-09 | 2025-03-25 | Apple Inc. | Application integration with a digital assistant |
US20180286397A1 (en) * | 2017-03-29 | 2018-10-04 | Honda Motor Co., Ltd. | Object authentication device and object authentication method |
US10861452B2 (en) * | 2017-03-29 | 2020-12-08 | Honda Motor Co., Ltd. | Object authentication device and object authentication method |
US11276402B2 (en) * | 2017-05-08 | 2022-03-15 | Cloudminds Robotics Co., Ltd. | Method for waking up robot and robot thereof |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12026197B2 (en) | 2017-05-16 | 2024-07-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US20190124388A1 (en) * | 2017-10-24 | 2019-04-25 | Comcast Cable Communications, Llc | Determining context to initiate interactivity |
US11445235B2 (en) * | 2017-10-24 | 2022-09-13 | Comcast Cable Communications, Llc | Determining context to initiate interactivity |
US11792464B2 (en) | 2017-10-24 | 2023-10-17 | Comcast Cable Communications, Llc | Determining context to initiate interactivity |
US12211502B2 (en) | 2018-03-26 | 2025-01-28 | Apple Inc. | Natural assistant interaction |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US12061752B2 (en) | 2018-06-01 | 2024-08-13 | Apple Inc. | Attention aware virtual assistant dismissal |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US12136419B2 (en) | 2019-03-18 | 2024-11-05 | Apple Inc. | Multimodality in digital assistant systems |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US12154571B2 (en) | 2019-05-06 | 2024-11-26 | Apple Inc. | Spoken notifications |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US12216894B2 (en) | 2019-05-06 | 2025-02-04 | Apple Inc. | User configurable task triggers |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11227599B2 (en) * | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11335342B2 (en) * | 2020-02-21 | 2022-05-17 | International Business Machines Corporation | Voice assistance system |
US12197712B2 (en) | 2020-05-11 | 2025-01-14 | Apple Inc. | Providing relevant data items based on context |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US12219314B2 (en) | 2020-07-21 | 2025-02-04 | Apple Inc. | User identification using headphones |
US12021806B1 (en) | 2021-09-21 | 2024-06-25 | Apple Inc. | Intelligent message delivery |
Also Published As
Publication number | Publication date |
---|---|
CN107567611A (en) | 2018-01-09 |
JP2018515817A (en) | 2018-06-14 |
KR20170129165A (en) | 2017-11-24 |
EP3271803A1 (en) | 2018-01-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170262051A1 (en) | Method for refining control by combining eye tracking and voice recognition | |
US10838513B2 (en) | Responding to selection of a displayed character string | |
US9223590B2 (en) | System and method for issuing commands to applications based on contextual information | |
US8922489B2 (en) | Text input using key and gesture information | |
JP4527731B2 (en) | Virtual keyboard system with automatic correction function | |
US8327282B2 (en) | Extended keyboard user interface | |
US10275152B2 (en) | Advanced methods and systems for text input error correction | |
US9753906B2 (en) | Character string replacement | |
US9043300B2 (en) | Input method editor integration | |
US20140078065A1 (en) | Predictive Keyboard With Suppressed Keys | |
US20130007606A1 (en) | Text deletion | |
US20100287486A1 (en) | Correction of typographical errors on touch displays | |
EP2713255A1 (en) | Method and electronic device for prompting character input | |
US20150186347A1 (en) | Information processing device, information processing method, and computer program product | |
US9910589B2 (en) | Virtual keyboard with adaptive character recognition zones | |
KR20180101723A (en) | Semantic zoom animations | |
WO2014058948A1 (en) | A split virtual keyboard on a mobile computing device | |
US20140068509A1 (en) | Managing a Selection Mode for Presented Content | |
US20140304640A1 (en) | Techniques for input of a multi-character compound consonant or vowel and transliteration to another language using a touch computing device | |
EP2306287A2 (en) | Apparatus and method for displaying input character indicator | |
US11112965B2 (en) | Advanced methods and systems for text input error correction | |
US20150205781A1 (en) | Systems and methods for using tone indicator in text recognition | |
WO2016151396A1 (en) | Method for refining control by combining eye tracking and voice recognition | |
US9778839B2 (en) | Motion-based input method and system for electronic device | |
US20180300021A1 (en) | Text input system with correction facility |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: THE EYE TRIBE, DENMARK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TALL, MARTIN HENRIK;PRIESUM, JONAS;SAN AGUSTIN, JAVIER;REEL/FRAME:037948/0234 Effective date: 20160310 |
|
AS | Assignment |
Owner name: THE EYE TRIBE APS, DENMARK Free format text: AFFIDAVIT OF ASSIGNEE TO CORRECT THE ASSIGNEE'S NAME RECORDED AT REEL/FRAME: 037948/0234;ASSIGNOR:THE EYE TRIBE;REEL/FRAME:040912/0950 Effective date: 20161214 |
|
AS | Assignment |
Owner name: FACEBOOK, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THE EYE TRIBE APS;REEL/FRAME:041291/0471 Effective date: 20170216 |
|
AS | Assignment |
Owner name: THE EYE TRIBE APS, DENMARK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNORS SIGNATURES AND DATES PREVIOUSLY RECORDED AT REEL: 037948 FRAME: 0234. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:TALL, MARTIN HENRIK;PRIESUM, JONAS PHILIP;SAN AGUSTIN, JAVIER;SIGNING DATES FROM 20161213 TO 20161214;REEL/FRAME:042793/0300 |
|
AS | Assignment |
Owner name: THE EYE TRIBE, DENMARK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE SPELLING OF THE THIRD ASSIGNOR'S NAME PREVIOUSLY RECORDED ON REEL 042793 FRAME 0300. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:TALL, MARTIN HENRIK;PRIESUM, JONAS;SAN AGUSTIN LOPEZ, JAVIER;SIGNING DATES FROM 20161213 TO 20161214;REEL/FRAME:046426/0545 |
|
AS | Assignment |
Owner name: FACEBOOK TECHNOLOGIES, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FACEBOOK, INC.;REEL/FRAME:047687/0942 Effective date: 20181024 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: META PLATFORMS TECHNOLOGIES, LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:FACEBOOK TECHNOLOGIES, LLC;REEL/FRAME:062749/0697 Effective date: 20220318 |