深度学习已成为解决许多挑战性问题的首选方法。众所周知,经过足够的培训,深层网络可以分割并识别图像中的“关键点”。
如果一个非常简单的机制足够大,它将产生神奇的效果。
因此,这种运作良好的深度学习需要大量数据。 训练数据越多,模型的准确性越好。
但是,我们从哪里获得所有这些数据呢? 带有批注的数据获取可能既昂贵又费时。 雇用人们手动收集图像并标记图像是根本没有效率的。 而且,在深度学习时代,数据无疑是您最宝贵的资源。
在这里,向大家介绍一个简单的收集深度学习图像数据集的方法。
bing-images 是一个用于从 Bing.com 获取图像 URL 并下载的 Python 库。 具有以下特点
- 支持文件类型过滤器。
- 支持 Bing.com filterui 过滤器。
- 使用多线程和自定义线程池大小下载。
- 支持纯粹获取图像 URL。
Demo
创建一个叫 image-collector
的项目。
安装 bing-images
前提
- 安装 Google Chrome 浏览器
- 从这里下载
chromedriver
- 将
chromedriver
加入 PATH.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
❯ pip install bing-images
Collecting bing-images
Downloading bing_images-0.0.6-py3-none-any.whl (6.7 kB)
Collecting requests>=2.24.0
Using cached requests-2.25.1-py2.py3-none-any.whl (61 kB)
Collecting selenium>=3.141.0
Using cached selenium-3.141.0-py2.py3-none-any.whl (904 kB)
Collecting urllib3<1.27,>=1.21.1
Using cached urllib3-1.26.3-py2.py3-none-any.whl (137 kB)
Requirement already satisfied: certifi>=2017.4.17 in /Users/catchzeng/miniconda3/envs/test/lib/python3.8/site-packages (from requests>=2.24.0->bing-images) (2020.12.5)
Collecting idna<3,>=2.5
Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting chardet<5,>=3.0.2
Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Installing collected packages: urllib3, idna, chardet, selenium, requests, bing-images
Successfully installed bing-images-0.1.0 chardet-4.0.0 idna-2.10 requests-2.25.1 selenium-3.141.0 urllib3-1.26.3
获取图片 URLs
fetch_image_urls.py
1
2
3
4
5
6
7
8
from bing_images import bing
urls = bing.fetch_image_urls("cat", limit=10, file_type='png', filters='+filterui:aspect-square+filterui:color2-bw')
print("{} images.".format(len(urls)))
counter = 1
for url in urls:
print("{}: {}".format(counter, url))
counter += 1
运行
1
2
3
4
5
6
7
8
9
10
11
12
❯ python fetch_image_urls.py
10 images.
1: http://pngimg.com/uploads/cat/cat_PNG50521.png
2: http://pngimg.com/uploads/cat/cat_PNG1616.png
3: https://pngimg.com/uploads/cat/cat_PNG50532.png
4: https://pngimg.com/uploads/cat/cat_PNG1621.png
5: https://pngimg.com/uploads/cat/cat_PNG1618.png
6: http://pngimg.com/uploads/cat/cat_PNG1624.png
7: http://www.pngmart.com/files/5/Black-Cat-PNG-Transparent.png
8: http://www.myiconfinder.com/uploads/iconsets/256-256-a96249f4c8a9753fd904f8be023dc25c-cat.png
9: https://pngimg.com/uploads/cat/cat_PNG1619.png
10: http://pngimg.com/uploads/cat/cat_PNG50521.png
多线程下载
download.py
1
2
3
4
5
6
7
8
from bing_images import bing
bing.download_images("cat",
20,
output_dir="/Users/catchzeng/Desktop/cat",
pool_size=10,
file_type="png",
force_replace=True)
运行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
❯ python download.py
Save path: /Users/catchzeng/Desktop/cat
Downloading images
#1 http://pngimg.com/uploads/cat/cat_PNG50509.png Downloaded
#2 https://pngimg.com/uploads/cat/cat_PNG50498.png Downloaded
#3 http://www.freepngimg.com/download/cat/22193-3-adorable-cat.png Downloaded
#4 http://pngimg.com/uploads/cat/cat_PNG106.png Downloaded
#5 https://pngimg.com/uploads/cat/cat_PNG50465.png Downloaded
#6 https://pngimg.com/uploads/cat/cat_PNG50417.png Downloaded
#7 https://pngimg.com/uploads/cat/cat_PNG50480.png Downloaded
#8 http://pngimg.com/uploads/cat/cat_PNG119.png Downloaded
#9 https://pngimg.com/uploads/cat/cat_PNG50438.png Downloaded
#10 http://pngimg.com/uploads/cat/cat_PNG100.png Downloaded
#11 https://pngimg.com/uploads/cat/cat_PNG50447.png Downloaded
#12 https://pngimg.com/uploads/cat/cat_PNG50440.png Downloaded
#13 https://pngimg.com/uploads/cat/cat_PNG50433.png Downloaded
#14 https://www.pngarts.com/files/1/Baby-Cat-PNG-Free-Download.png Downloaded
#15 https://cdn.pixabay.com/photo/2017/02/22/16/55/cat-2089916_960_720.png Downloaded
#16 https://pngimg.com/uploads/cat/cat_PNG50434.png Downloaded
#17 http://pngimg.com/uploads/cat/cat_PNG50529.png Downloaded
#18 http://pngimg.com/uploads/cat/cat_PNG113.png Downloaded
#19 https://purepng.com/public/uploads/large/purepng.com-catanimalscat-981524673949tj5ns.png Downloaded
#20 https://pngimg.com/uploads/cat/cat_PNG50435.png Downloaded
Renaming images
Finished renaming
Done
Elapsed time: 20.76s
下载方形黑白图
download-square.py
1
2
3
4
5
6
7
8
9
from bing_images import bing
bing.download_images("cat",
20,
output_dir="/Users/catchzeng/Desktop/cat",
pool_size=20,
file_type="png",
filters='+filterui:aspect-square+filterui:color2-bw',
force_replace=True)
详细的代码,请见 https://github.com/CatchZeng/bing_images,再见!