Deep Learning has become the go-to method for solving many challenging problems. As we know, with enough training, a deep network can segment and identify the “key points” in the image.
If a very simple mechanism is large enough, it will have a magical effect.
Therefore, this well-functioning deep learning requires a lot of data. The more training data, the better the accuracy of the model.
But where do we get all this data from? Well-annotated data can be both expensive and time-consuming to acquire. Hiring people to manually collect images and label them is not efficient at all. And, in the deep learning era, data is very well arguably your most valuable resource.
Here, I show a simple way to collect your deep learning image dataset.
The bing-images is a Python library to fetch image URLs and download using multithreading from Bing.com. It has the following features
- Support file type filters.
- Support Bing.com filterui filters.
- Download using multithreading and custom thread pool size.
- Support purely obtaining the image URLs.
Demo
Create a demo project, called image-collector
here.
Install bing-images
Requirements
- Install Google Chrome Browser.
- Download
chromedriver
from here. - Add
chromedriver
to PATH.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
❯ pip install bing-images
Collecting bing-images
Downloading bing_images-0.0.6-py3-none-any.whl (6.7 kB)
Collecting requests>=2.24.0
Using cached requests-2.25.1-py2.py3-none-any.whl (61 kB)
Collecting selenium>=3.141.0
Using cached selenium-3.141.0-py2.py3-none-any.whl (904 kB)
Collecting urllib3<1.27,>=1.21.1
Using cached urllib3-1.26.3-py2.py3-none-any.whl (137 kB)
Requirement already satisfied: certifi>=2017.4.17 in /Users/catchzeng/miniconda3/envs/test/lib/python3.8/site-packages (from requests>=2.24.0->bing-images) (2020.12.5)
Collecting idna<3,>=2.5
Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting chardet<5,>=3.0.2
Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Installing collected packages: urllib3, idna, chardet, selenium, requests, bing-images
Successfully installed bing-images-0.1.0 chardet-4.0.0 idna-2.10 requests-2.25.1 selenium-3.141.0 urllib3-1.26.3
Fetch image URLs
fetch_image_urls.py
1
2
3
4
5
6
7
8
from bing_images import bing
urls = bing.fetch_image_urls("cat", limit=10, file_type='png', filters='+filterui:aspect-square+filterui:color2-bw')
print("{} images.".format(len(urls)))
counter = 1
for url in urls:
print("{}: {}".format(counter, url))
counter += 1
Run
1
2
3
4
5
6
7
8
9
10
11
12
❯ python fetch_image_urls.py
10 images.
1: http://pngimg.com/uploads/cat/cat_PNG50521.png
2: http://pngimg.com/uploads/cat/cat_PNG1616.png
3: https://pngimg.com/uploads/cat/cat_PNG50532.png
4: https://pngimg.com/uploads/cat/cat_PNG1621.png
5: https://pngimg.com/uploads/cat/cat_PNG1618.png
6: http://pngimg.com/uploads/cat/cat_PNG1624.png
7: http://www.pngmart.com/files/5/Black-Cat-PNG-Transparent.png
8: http://www.myiconfinder.com/uploads/iconsets/256-256-a96249f4c8a9753fd904f8be023dc25c-cat.png
9: https://pngimg.com/uploads/cat/cat_PNG1619.png
10: http://pngimg.com/uploads/cat/cat_PNG50521.png
Download using multithreading
download.py
1
2
3
4
5
6
7
8
from bing_images import bing
bing.download_images("cat",
20,
output_dir="/Users/catchzeng/Desktop/cat",
pool_size=10,
file_type="png",
force_replace=True)
Run
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
❯ python download.py
Save path: /Users/catchzeng/Desktop/cat
Downloading images
#1 http://pngimg.com/uploads/cat/cat_PNG50509.png Downloaded
#2 https://pngimg.com/uploads/cat/cat_PNG50498.png Downloaded
#3 http://www.freepngimg.com/download/cat/22193-3-adorable-cat.png Downloaded
#4 http://pngimg.com/uploads/cat/cat_PNG106.png Downloaded
#5 https://pngimg.com/uploads/cat/cat_PNG50465.png Downloaded
#6 https://pngimg.com/uploads/cat/cat_PNG50417.png Downloaded
#7 https://pngimg.com/uploads/cat/cat_PNG50480.png Downloaded
#8 http://pngimg.com/uploads/cat/cat_PNG119.png Downloaded
#9 https://pngimg.com/uploads/cat/cat_PNG50438.png Downloaded
#10 http://pngimg.com/uploads/cat/cat_PNG100.png Downloaded
#11 https://pngimg.com/uploads/cat/cat_PNG50447.png Downloaded
#12 https://pngimg.com/uploads/cat/cat_PNG50440.png Downloaded
#13 https://pngimg.com/uploads/cat/cat_PNG50433.png Downloaded
#14 https://www.pngarts.com/files/1/Baby-Cat-PNG-Free-Download.png Downloaded
#15 https://cdn.pixabay.com/photo/2017/02/22/16/55/cat-2089916_960_720.png Downloaded
#16 https://pngimg.com/uploads/cat/cat_PNG50434.png Downloaded
#17 http://pngimg.com/uploads/cat/cat_PNG50529.png Downloaded
#18 http://pngimg.com/uploads/cat/cat_PNG113.png Downloaded
#19 https://purepng.com/public/uploads/large/purepng.com-catanimalscat-981524673949tj5ns.png Downloaded
#20 https://pngimg.com/uploads/cat/cat_PNG50435.png Downloaded
Renaming images
Finished renaming
Done
Elapsed time: 20.76s
Download square black-white images
download-square.py
1
2
3
4
5
6
7
8
9
from bing_images import bing
bing.download_images("cat",
20,
output_dir="/Users/catchzeng/Desktop/cat",
pool_size=20,
file_type="png",
filters='+filterui:aspect-square+filterui:color2-bw',
force_replace=True)
The detailed code is at https://github.com/CatchZeng/bing_images. See you!