XView 2 Challenge | Part 1: Getting Started

6 min readDec 8, 2019

Hello, peeps! As you may have grasped from the title, I have decided to give a shot at the XView 2 challenge. Now to those of you who don’t know me and may have some expectations with this: I have a disclaimer. I am no data scientist. I have never (seriously) participated in any Data Science / Computer Vision competition before. I am just eager to learn some new tricks and skills here. I feel the best way to go about it would be to give a shot at a $150,000 prized competition. Because…why not? I do not expect to win anything, but I’m pretty sure I’ll learn something. Blogging about it is my attempt to stay motivated and share what I learn in the competition with this series of articles. So if you’re ready, let's get started!

Disaster types and disasters represented in xBD around the world. [Source: xBD Paper]

About the competition

The first thing one would do is get to know what the challenge is about. So let’s visit the challenge website at https://www.xview2.org/. As you will read on the landing page, it says: “The xView2 Challenge focuses on automating the process of assessing building damage after a natural disaster, an analytical bottleneck in the post-disaster workflow.” Sounds interesting. So how does this work? As the site says the Defense Innovation Unit (DIU) and Humanitarian Assistance and Disaster Relief (HADR) are releasing a high-resolution labeled dataset of raw satellite imagery exposing ground conditions hit by natural disasters. Disaster recovery efforts are currently slow as the analysts have to manually filter through the imagery to access the damage and forward the information to recovery teams on the ground. What the DIU challenges us to do in this competition is to automatically detect structures on the ground and predict the scale of damage to the place after the natural disaster has occurred. e.g whether a building is undamaged or completely destroyed after an earthquake. This would help recovery teams focus on the key areas and accelerate their rescue efforts to provide relief to areas most affected by the disaster. For further information please go through the site.

Pre and post-disaster (Tubbs Fire) imagery of a residential subdivision in Santa Rosa, Calif. [Source: xview2.org]

About the Dataset

Lets now see what the dataset contains. The dataset primarily consists of images of a region pre and post a natural disaster. The xBD dataset currently covers 15 countries and 6 types of disasters. The images in this dataset come labeled with manually annotated polygons highlighting buildings and other structures along with the scale index of their damage i.e. 0–3. The labels also contain image metadata such as disaster type, image resolution, date, sensor, etc. You can find the description of these fields here. You can explore the dataset in Kaggle by forking the kernel here: xView2 Challenge

There is also a paper by Gupta et al. detailing the dataset. Here are some insights from the paper.

The data has been sourced from the Maxar/DigitalGlobe Open Data Program, which releases imagery for major crisis events.
The xBD dataset contains 22,068 images pertaining to 19 natural disasters.
There 850,736 annotated polygons of buildings.
The imagery covers around 45,361.73km² of area.
The targeted Ground Sample Distance is below 0.8. However, there are differences in the images from the same geographic region due to some factors.
Environmental factors such as flood water have been annotated.
Post-disaster images have been altered slightly to account for re-projection issues since the paired images were taken at different times.
There are no polygons present in some post-disaster imagery, as the buildings were either created after the disaster or due to other factors such as haze and cloud obstruction.

Annotated image by damage scale [Source: xview2.org]

Dataset Statistics

One of the key steps before training a model is EDA (Exploratory Data Analysis). This step requires exploring the number of data points available per class or group pertaining to the dataset. Luckily for us, there is a section in the paper detailing the statistics regarding the imagery in the dataset. Here are the key details I’ve noted down:

Area of imagery (in km2 ) per disaster event. [Source: xBD Paper]

The imagery is highly unbalanced pertaining to disasters. While the Portugal Wildfire and Pinery Bushfire cover around 8000km², the Mexico Earthquake and Palu Tsunami cover less than 1000km². The Mexico Earthquake and Palu Tsunami, however, make up in the number of polygon annotations in the dataset. They both contain around 100,000 labeled polygon mappings across the dataset.

Polygons in xBD per disaster event. [Source: xBD Paper]

Positive and negative imagery per disaster. [Source: xBD Paper]

We should note that the pre and post-disaster imagery is also unbalanced. For example, as we see in the diagram below most of the dataset contains positive imagery ( post-disaster imagery ). There are only a couple of disaster events that have a balanced set of positive and negative imagery e.g Social Fire, Portugal Wildfire and Woolsey Fire.

Damage classification count. [Source: xBD Paper]

The final diagram shows us the damage classification count. The distribution of the dataset is heavily skewed towards the No Damage class with 313,033 polygons. This is eight times more than the other classes. There are also a handful of annotations that are marked as unclassified.

As stated in the paper, given the unbalanced data, we are presented with a very challenging task of segmenting and classifying the damaged structures within the imagery. I presume this task will require heavy use of data augmentation and other such techniques to create a relatively balanced set for training our model. We shall begin on this task in the next post wherein we will set up our system for training our model. Until then, I encourage you to play around with the dataset on Kaggle Kernels and explore its features. You can even try the baseline repo provided on Github.

I would like to extend a special thanks to Ritwik Gupta for reviewing the post and providing me with valuable inputs before publishing.

References

xView2. (n.d.). Retrieved from https://www.xview2.org/.
Gupta, et al. “XBD: A Dataset for Assessing Building Damage from Satellite Imagery.” ArXiv.org, 21 Nov. 2019, https://arxiv.org/abs/1911.09296.
Gupta, Ritwik et al. “Creating xBD: A Dataset for Assessing Building Damage from Satellite Imagery.” CVPR Workshops (2019).
“Satellite Imagery for Natural Disasters.” DigitalGlobe, https://www.digitalglobe.com/ecosystem/open-data.
“AI in Humanitarian Assistance and Disaster Response.” YouTube, Software Engineering Institute | Carnegie Mellon University, www.youtube.com/watch?v=UW5CP9YahG0.
xBD Building Damage Dataset (+550k Annotations/+19k sq km) Available for Download (https://xview2.org/dataset) : MachineLearning. [online] Available at: https://www.reddit.com/r/MachineLearning/comments/d6hjgn/n_xbd_building_damage_dataset_550k_annotations19k/
xView2: Updated xBD Building Damage Dataset (+850k Annotations/+45k sq km) Available for Download | Leaderboard Release and Submission Deadline Extended : MachineLearning. [online] Available at: https://www.reddit.com/r/MachineLearning/comments/dpre53/n_xview2_updated_xbd_building_damage_dataset_850k/
Gupta, Ritwik. “Deep Learning and Satellite Imagery: DIUx Xview Challenge.” Deep Learning and Satellite Imagery: DIUx Xview Challenge, 28 Jan. 2019, insights.sei.cmu.edu/sei_blog/2019/01/deep-learning-and-satellite-imagery-diux-xview-challenge.html.
Brian. “Bringing It On During DIUx XView 2018 Detection Challenge.” Wovenware Blog, 11 Oct. 2018, www.wovenware.com/blog/2018/10/bringing-it-on-during-diux-xview-2018-detection-challenge/.
Gupta, R. (2019). xBD Metadata Explanation [online] Available at: https://cdn.discordapp.com/attachments/624633738512433154/648748268393857026/xBD_Metadata_Explanation.pdf

XView 2 Challenge | Part 1: Getting Started

About the competition

About the Dataset

Dataset Statistics

References

Written by Lezwon Castelino