A system evaluates modifications to components of an autonomous vehicle (AV) stack. The system receives driving recommendations traffic scenarios based on user annotations of video frames showing each traffic scenario. For each traffic scenario, the system predicts driving recommendations based on the AV stack. The system determines a measure of quality of driving recommendation by comparing predicted driving recommendations based on the AV stack with the driving recommendations received for the traffic scenario. The measure of quality of driving recommendation is used for evaluating components of the AV stack. The system determines a driving recommendation for an AV corresponding to ranges of SOMAI (state of mind) score and sends signals to controls of the autonomous vehicle to navigate the autonomous vehicle according to the driving recommendation. The system identifies additional training data for training machine learning model based on the measure of driving quality.
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 20/40 - ScenesScene-specific elements in video content
2.
SCENARIO BASED MONITORING AND CONTROL OF AUTONOMOUS VEHICLES
A system evaluates modifications to components of an autonomous vehicle (AV) stack. The system receives driving recommendations traffic scenarios based on user annotations of video frames showing each traffic scenario. For each traffic scenario, the system predicts driving recommendations based on the AV stack. The system determines a measure of quality of driving recommendation by comparing predicted driving recommendations based on the AV stack with the driving recommendations received for the traffic scenario. The measure of quality of driving recommendation is used for evaluating components of the AV stack. The system determines a driving recommendation for an AV corresponding to ranges of SOMAI (state of mind) score and sends signals to controls of the autonomous vehicle to navigate the autonomous vehicle according to the driving recommendation. The system identifies additional training data for training machine learning model based on the measure of driving quality.
A system evaluates modifications to components of an autonomous vehicle (AV) stack. The system receives driving recommendations traffic scenarios based on user annotations of video frames showing each traffic scenario. For each traffic scenario, the system predicts driving recommendations based on the AV stack. The system determines a measure of quality of driving recommendation by comparing predicted driving recommendations based on the AV stack with the driving recommendations received for the traffic scenario. The measure of quality of driving recommendation is used for evaluating components of the AV stack. The system determines a driving recommendation for an AV corresponding to ranges of SOMAI (state of mind) score and sends signals to controls of the autonomous vehicle to navigate the autonomous vehicle according to the driving recommendation. The system identifies additional training data for training machine learning model based on the measure of driving quality.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
G08G 1/01 - Detecting movement of traffic to be counted or controlled
B60W 40/12 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to parameters of the vehicle itself
4.
Generating training data for machine learning based models for autonomous vehicles
A system receives information describing paths traversed by vehicles of a vehicle type, for example, a bicycle or a motorcycle. The system determines locations along the paths. For each location the system determines a measure of likelihood of encountering vehicles of the vehicle type in traffic at the location. The system selects a subset of locations based on the measure of likelihood and obtains sensor data captured at the subset of locations. The system uses the sensor data as training dataset for training a machine learning based model configured to receive input sensor data describing traffic and output a score used for navigation of autonomous vehicles. The machine learning model is provided to a vehicle, for example, an autonomous vehicle for navigation of the autonomous vehicle.
Systems and methods for predicting user interaction with vehicles. A computing device receives an image and a video segment of a road scene, the first at least one of an image and a video segment being taken from a perspective of a participant in the road scene and then generates stimulus data based on the image and the video segment. Stimulus data is transmitted to a user interface and response data is received, which includes at least one of an action and a likelihood of the action corresponding to another participant in the road scene. The computing device aggregates a subset of the plurality of response data to form statistical data and a model is created based on the statistical data. The model is applied to another image or video segment and a prediction of user behavior in the another image or video segment is generated.
G08G 1/04 - Detecting movement of traffic to be counted or controlled using optical or ultrasonic detectors
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
B60W 30/00 - Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
G06N 3/084 - Backpropagation, e.g. using gradient descent
G06V 20/40 - ScenesScene-specific elements in video content
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/20 - Movements or behaviour, e.g. gesture recognition
G06F 18/40 - Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/778 - Active pattern-learning, e.g. online learning of image or video features
G06N 20/10 - Machine learning using kernel methods, e.g. support vector machines [SVM]
An autonomous vehicle collects sensor data of an environment surrounding the autonomous vehicle including traffic entities such as pedestrians, bicyclists, or other vehicles. The sensor data is provided to a machine learning based model along with an expected turn direction of the autonomous vehicle to determine a hidden context attribute of a traffic entity given the expected turn direction of the autonomous vehicle. The hidden context attribute of the traffic entity represents factors that affect the behavior of the traffic entity, and the hidden context attribute is used to predict future behavior of the traffic entity. Instructions to control the autonomous vehicle are generated based on the hidden context attribute.
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
7.
Ground truth based metrics for evaluation of machine learning based models for predicting attributes of traffic entities for navigating autonomous vehicles
A system uses a machine learning based model to determine attributes describing states of mind and behavior of traffic entities in video frames captured by an autonomous vehicle. The system classifies video frames according to traffic scenarios depicted, where each scenario is associated with a filter based on vehicle attributes, traffic attributes, and road attributes. The system identifies a set of video frames associated with ground truth scenarios for validating the accuracy of the machine learning based model and predicts attributes of traffic entities in the video frames. The system analyzes video frames captured after the set of video frames to determine actual attributes of the traffic entities. Based on a comparison of the predicted attributes and actual attributes, the system determines a likelihood of the machine learning based model making accurate predictions and uses the likelihood to generate a navigation action table for controlling the autonomous vehicle.
G06V 20/40 - ScenesScene-specific elements in video content
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
G06F 18/40 - Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06F 18/21 - Design or setup of recognition systems or techniquesExtraction of features in feature spaceBlind source separation
G06F 18/2113 - Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/20 - Movements or behaviour, e.g. gesture recognition
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
8.
Scenario identification for validation and training of machine learning based models for autonomous vehicles
A system uses a machine learning based model to determine attributes describing states of mind and behavior of traffic entities in video frames captured by an autonomous vehicle. The system classifies video frames according to traffic scenarios depicted, where each scenario is associated with a filter based on vehicle attributes, traffic attributes, and road attributes. The system identifies a set of video frames associated with ground truth scenarios for validating the accuracy of the machine learning based model and predicts attributes of traffic entities in the video frames. The system analyzes video frames captured after the set of video frames to determine actual attributes of the traffic entities. Based on a comparison of the predicted attributes and actual attributes, the system determines a likelihood of the machine learning based model making accurate predictions and uses the likelihood to generate a navigation action table for controlling the autonomous vehicle.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
G05D 1/02 - Control of position or course in two dimensions
G06F 18/21 - Design or setup of recognition systems or techniquesExtraction of features in feature spaceBlind source separation
G06F 18/2113 - Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06F 18/40 - Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/40 - ScenesScene-specific elements in video content
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/20 - Movements or behaviour, e.g. gesture recognition
9.
Display panel of a programmed computer system with a graphical user interface
Systems and methods for predicting user interaction with vehicles. A computing device receives an image and a video segment of a road scene, the first at least one of an image and a video segment being taken from a perspective of a participant in the road scene and then generates stimulus data based on the image and the video segment. Stimulus data is transmitted to a user interface and response data is received, which includes at least one of an action and a likelihood of the action corresponding to another participant in the road scene. The computing device aggregates a subset of the plurality of response data to form statistical data and a model is created based on the statistical data. The model is applied to another image or video segment and a prediction of user behavior in the another image or video segment is generated.
G06N 3/04 - Architecture, e.g. interconnection topology
B60W 30/00 - Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06F 18/40 - Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
G06N 3/084 - Backpropagation, e.g. using gradient descent
G06V 10/778 - Active pattern-learning, e.g. online learning of image or video features
G06V 20/40 - ScenesScene-specific elements in video content
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/20 - Movements or behaviour, e.g. gesture recognition
G08G 1/04 - Detecting movement of traffic to be counted or controlled using optical or ultrasonic detectors
Systems and methods for predicting user interaction with vehicles. A computing device receives an image and a video segment of a road scene, the first at least one of an image and a video segment being taken from a perspective of a participant in the road scene and then generates stimulus data based on the image and the video segment. Stimulus data is transmitted to a user interface and response data is received, which includes at least one of an action and a likelihood of the action corresponding to another participant in the road scene. The computing device aggregates a subset of the plurality of response data to form statistical data and a model is created based on the statistical data. The model is applied to another image or video segment and a prediction of user behavior in the another image or video segment is generated.
B60W 30/00 - Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06F 18/40 - Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
G06N 3/04 - Architecture, e.g. interconnection topology
G06N 3/084 - Backpropagation, e.g. using gradient descent
G06V 10/778 - Active pattern-learning, e.g. online learning of image or video features
G06V 20/40 - ScenesScene-specific elements in video content
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/20 - Movements or behaviour, e.g. gesture recognition
G08G 1/04 - Detecting movement of traffic to be counted or controlled using optical or ultrasonic detectors
A vehicle collects video data of an environment surrounding the vehicle including traffic entities, e.g., pedestrians, bicyclists, or other vehicles. The captured video data is sampled and the sampled video frames are presented to users to provide input on a traffic entity's state of mind. The system determines an attribute value that describes a statistical distribution of user responses for the traffic entity. If the attribute for a sampled video frame is within a threshold of the attribute of another video frame, the system interpolates attribute for a third video frame between the two sampled video frames. Otherwise, the system requests further user input for a video frame captured between the two sampled video frames. The interpolated and/or user based attributes are used to train a machine learning based model that predicts a hidden context of the traffic entity. The trained model is used for navigation of autonomous vehicles.
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/778 - Active pattern-learning, e.g. online learning of image or video features
G06V 20/40 - ScenesScene-specific elements in video content
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/20 - Movements or behaviour, e.g. gesture recognition
15.
Adaptive sampling of stimuli for training of machine learning based models for predicting hidden context of traffic entities for navigating autonomous vehicles
A vehicle collects video data of an environment surrounding the vehicle including traffic entities, e.g., pedestrians, bicyclists, or other vehicles. The captured video data is sampled and presented to users to provide input on a traffic entity's state of mind. The user responses on the captured video data is used to generate a training dataset. A machine learning based model configured to predict a traffic entity's state of mind is trained with the training dataset. The system determines input video frames and associated dimension attributes for which the model performs poorly. The dimension attributes characterize stimuli and/or an environment shown in the input video frames. The system generates a second training dataset based on video frames that have the dimension attributes for which the model performed poorly. The model is retrained using the second training dataset and provided to an autonomous vehicle to assist with navigation in traffic.
A system uses neural networks to determine intents of traffic entities (e.g., pedestrians, bicycles, vehicles) in an environment surrounding a vehicle (e.g., an autonomous vehicle) and generates commands to control the vehicle based on the determined intents. The system receives images of the environment captured by sensors on the vehicle, and processes the images using neural network models to determine overall intents or predicted actions of the one or more traffic entities within the images. The system generates commands to control the vehicle based on the determined overall intents of the traffic entities.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 40/10 - Human or animal bodies, e.g. vehicle occupants or pedestriansBody parts, e.g. hands
17.
Visualizing machine learning predictions of human interaction with vehicles
A computing device accesses video data displaying one or more traffic entities and generates a plurality of sequences from the video data. For each sequence, the computing device identifies a plurality of stimuli in the sequence and applies a machine learning model to generate an output describing the traffic entity. The computing device generates a data structure for storing, for each sequence, information describing the sequence and linking frame indexes of stimuli from the sequence to outputs of the machine learning model. The computing device stores the data structure in association with the video data. Responsive to receiving a selection of a sequence, the computing device loads video data for the sequence. Responsive to receiving a selection of a traffic entity within the video data, the computing device generates a graphical display element including the machine learning model output for the selected traffic entity.
An autonomous vehicle uses machine learning based models such as neural networks to predict hidden context attributes associated with traffic entities. The hidden context represents behavior of the traffic entities in the traffic. The machine learning based model is configured to receive a video frame as input and output likelihoods of receiving user responses having particular ordinal values. The system uses a loss function based on cumulative histogram of user responses corresponding to various ordinal values. The system identifies user responses that are unlikely to be valid user responses to generate training data for training the machine learning mode. The system identifies invalid user responses based on response time of the user responses.
G06V 10/50 - Extraction of image or video features by performing operations within image blocksExtraction of image or video features by using histograms, e.g. histogram of oriented gradients [HoG]Extraction of image or video features by summing image-intensity valuesProjection analysis
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
B60W 40/02 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to ambient conditions
An autonomous vehicle uses machine learning based models such as neural networks to predict hidden context attributes associated with traffic entities. The hidden context represents behavior of the traffic entities in the traffic. The machine learning based model is configured to receive a video frame as input and output likelihoods of receiving user responses having particular ordinal values. The system uses a loss function based on cumulative histogram of user responses corresponding to various ordinal values. The system identifies user responses that are unlikely to be valid user responses to generate training data for training the machine learning mode. The system identifies invalid user responses based on response time of the user responses.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
B60W 40/02 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to ambient conditions
G05D 1/02 - Control of position or course in two dimensions
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
G06V 10/50 - Extraction of image or video features by performing operations within image blocksExtraction of image or video features by using histograms, e.g. histogram of oriented gradients [HoG]Extraction of image or video features by summing image-intensity valuesProjection analysis
20.
Machine learning based prediction of human interactions with autonomous vehicles
Systems and methods for predicting user interaction with vehicles. A computing device receives an image and a video segment of a road scene, the first at least one of an image and a video segment being taken from a perspective of a participant in the road scene and then generates stimulus data based on the image and the video segment. Stimulus data is transmitted to a user interface and response data is received, which includes at least one of an action and a likelihood of the action corresponding to another participant in the road scene. The computing device aggregates a subset of the plurality of response data to form statistical data and a model is created based on the statistical data. The model is applied to another image or video segment and a prediction of user behavior in the another image or video segment is generated.
G08G 1/04 - Detecting movement of traffic to be counted or controlled using optical or ultrasonic detectors
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
B60W 30/00 - Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
G06N 5/00 - Computing arrangements using knowledge-based models
G06N 20/10 - Machine learning using kernel methods, e.g. support vector machines [SVM]
21.
Symbolic modeling and simulation of non-stationary traffic objects for testing and development of autonomous vehicle systems
A system performs modeling and simulation of non-stationary traffic entities for testing and development of modules used in an autonomous vehicle system. The system uses a machine learning based model that predicts hidden context attributes for traffic entities that may be encountered by a vehicle in traffic. The system generates simulation data for testing and development of modules that help navigate autonomous vehicles. The generated simulation data may be image or video data including representations of traffic entities, for example, pedestrians, bicyclists, and other vehicles. The system may generate simulation data using generative adversarial neural networks.
G01C 22/00 - Measuring distance traversed on the ground by vehicles, persons, animals or other moving solid bodies, e.g. using odometers or using pedometers
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
B60W 50/00 - Details of control systems for road vehicle drive control not related to the control of a particular sub-unit
An autonomous vehicle uses probabilistic neural networks to predict hidden context attributes associated with traffic entities. The hidden context represents behavior of the traffic entities in the traffic. The probabilistic neural network is configured to receive an image of traffic as input and generate output representing hidden context for a traffic entity displayed in the image. The system executes the probabilistic neural network to generate output representing hidden context for traffic entities encountered while navigating through traffic. The system determines a measure of uncertainty for the output values. The autonomous vehicle uses the measure of uncertainty generated by the probabilistic neural network during navigation.
An autonomous vehicle uses machine learning based models to predict hidden context attributes associated with traffic entities. The system uses the hidden context to predict behavior of people near a vehicle in a way that more closely resembles how human drivers would judge the behavior. The system determines an activation threshold value for a braking system of the autonomous vehicle based on the hidden context. The system modifies a world model based on the hidden context predicted by the machine learning based model. The autonomous vehicle is safely navigated, such that the vehicle stays at least a threshold distance away from traffic entities.
An autonomous vehicle uses machine learning based models to predict hidden context attributes associated with traffic entities. The system uses the hidden context to predict behavior of people near a vehicle in a way that more closely resembles how human drivers would judge the behavior. The system determines an activation threshold value for a braking system of the autonomous vehicle based on the hidden context. The system modifies a world model based on the hidden context predicted by the machine learning based model. The autonomous vehicle is safely navigated, such that the vehicle stays at least a threshold distance away from traffic entities.
A system performs modeling and simulation of non-stationary traffic entities for testing and development of modules used in an autonomous vehicle system. The system uses a machine learning based model that predicts hidden context attributes for traffic entities that may be encountered by a vehicle in traffic. The system generates simulation data for testing and development of modules that help navigate autonomous vehicles. The generated simulation data may be image or video data including representations of traffic entities, for example, pedestrians, bicyclists, and other vehicles. The system may generate simulation data using generative adversarial neural networks.
A computing device receives an image and a video segment of a road scene, the first at least one of an image and a video segment being taken from a perspective of a participant in the road scene and then generates stimulus data based on the image and the video segment. Stimulus data is transmitted to a user interface and response data is received, which includes at least one of an action and a likelihood of the action corresponding to another participant in the road scene. The computing device aggregates a subset of the plurality of response data to form statistical data and a model is created based on the statistical data. The model is applied to another image or video segment and a prediction of user behavior in the another image or video segment is generated.
G08G 1/04 - Detecting movement of traffic to be counted or controlled using optical or ultrasonic detectors
G05D 1/00 - Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
B60W 30/00 - Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
G06N 5/00 - Computing arrangements using knowledge-based models
G06N 20/10 - Machine learning using kernel methods, e.g. support vector machines [SVM]
27.
System and method of predicting human interaction with vehicles
Systems and methods for predicting user interaction with vehicles. A computing device receives an image and a video segment of a road scene, the first at least one of an image and a video segment being taken from a perspective of a participant in the road scene and then generates stimulus data based on the image and the video segment. Stimulus data is transmitted to a user interface and response data is received, which includes at least one of an action and a likelihood of the action corresponding to another participant in the road scene. The computing device aggregates a subset of the plurality of response data to form statistical data and a model is created based on the statistical data. The model is applied to another image or video segment and a prediction of user behavior in the another image or video segment is generated.