It is essential for the widespread adoption of Automated Vehicles (AVs) that they can navigate areas without compromising the safety of surrounding road users. Safety is paramount in urban areas, where AVs frequently interact with pedestrians and cyclists.
Trajectory prediction models help an AV achieve this understanding by estimating the future positions and/or intent of the agents around the vehicle. These models generally predict future trajectories based on the observed trajectories of agents, but can also utilize additional information about the agents or their environment. Recent trajectory prediction datasets provide high-definition (HD) road maps annotated by humans that additionally contain lane information. However, using human annotators is costly and can delay much-needed map updates when the static environment changes.
Due to the high cost of manual annotation for reliable maps, an active research community is working on automatic road map generation from sensor data. However, existing approaches either require data from expensive ground-based recording vehicles, do not estimate both the road geometry and topology, or may not generalize well to urban areas.
In a paper presented at the 2025 IEEE Intelligent Vehicles Symposium, researchers propose a method that addresses these shortcomings—SAM-Maps. This is the first method that extracts both the drivable area and road connections in European urban areas from aerial images.
A Novel Approach
After covering the prior and related research, the paper outlines the proposed solution. SAM-Maps is a method for automatically generating road maps from aerial images of urban areas that leverages the power of foundation models, requiring no human annotation or additional training to map previously unseen areas. This method extracts a coarse road graph from the images and then estimates the geometry of the roads from this graph.
The proposed method utilizes RGB aerial images and foundation models to generate AV-suitable road maps (including drivable areas and road connections) of unseen urban areas, eliminating the need for additional training or human annotation. We show that these maps significantly improve trajectory predictions (compared to not using a map).

Intersection mapped with OSM [6] and SAM-Maps. Accurate OSM annotations are not available everywhere. The method does not require human annotation and can extract the geometry and topology of urban roads from aerial images.

OSM maps can contain mistakes or even vandalism, such as this fictitious town drawn in farmland [29].
SAM-Maps consists of three modules: Road Graph Extraction (RGE), Road Segmentation (RS), and Road Connection (RC):

Overview of the SAM-Maps method.
Road Graph Extraction (RGE):
A road graph (RG) is a global representation of the road network, consisting of edges that represent road segments and nodes that denote connection points between these segments.
Road Segmentation (RS):
After extracting the coarse road graph, our method estimates the geometry of the roads in the graph. The RS module does this by following the steps of normalization, coarse segmentation, and geometry refinement.
Road Connection (RC)
The RGE module provides the connections between the roads.
Proving the Method
The researchers evaluate the SAM-Maps generation method in terms of the accuracy of the generated maps compared to human-made annotations, as well as its usefulness in a downstream task.
Road Map Coverage
This method requires only manual correction of the road graph generated by SAM-Road. However, the latter requires full manual annotation of the road graph, which takes an annotator approximately 2 hours for the roads in the VoD-P dataset, compared to about 30 minutes of manual correction for SAM-Maps+.
Ablation Study
The modules in SAM-Maps can be further broken down into key operations, including mask segmentation, normalization, bounding box proposal, mask selection, and geometry refinement. We systematically ablate these components to assess their impact.

Example of the effect of geometry refinement.
Topological Road Boundary Detection
The study demonstrates that road boundaries can be easily extracted from the map produced by SAM-Maps, facilitating comparison with road boundary detection methods from the literature. Predictions are shown in red, ground truth future in green, and other agents in orange. The model using SAM-Maps correctly predicts the turn, but the model without a map does not.
Trajectory Prediction
The prediction model is unable to infer that the vehicle will turn without map input in the example shown, whereas it correctly estimates the turn with the SAM-Maps map. This map also offers better coverage than the OSM map, which lacks some of the roads.

Qualitative trajectory prediction results.
Conclusion
The researchers presented a method for generating road maps that contain the drivable area and road connections of unseen urban areas from aerial images, without requiring human annotation, thereby significantly reducing annotation costs and time. These maps can, however, be easily edited by humans to correct errors made in the automatic pipeline using software developed by the researchers.
Upon evaluation, the proposed SAM-Maps outperformed human-annotated maps. Studies show that the auto-generated map has an Intersection over Union (IoU) of 33.3% with the annotated map, increasing to 56.1% when using our semi-automated map.
Future work will include segmenting the lanes of a road individually and refining the estimation of intersection geometry.
Interested in learning more about Automated Vehicles? The IEEE Xplore Digital Library offers over 30,000 publications on Automated Vehicles.
Interested in acquiring full-text access to this collection for your entire organization? Request a free demo and trial subscription for your organization.




