AI - 一个简单的收集深度学习图像数据集的方法

Read in English

原文链接 https://medium.com/analytics-vidhya/a-simple-way-to-collect-your-deep-learning-image-dataset-4ead47b6826c

深度学习已成为解决许多挑战性问题的首选方法。众所周知,经过足够的培训,深层网络可以分割并识别图像中的“关键点”。

如果一个非常简单的机制足够大,它将产生神奇的效果。

因此,这种运作良好的深度学习需要大量数据。 训练数据越多,模型的准确性越好。

但是,我们从哪里获得所有这些数据呢? 带有批注的数据获取可能既昂贵费时。 雇用人们手动收集图像并标记图像是根本没有效率的。 而且,在深度学习时代,数据无疑是您最宝贵的资源

在这里,向大家介绍一个简单的收集深度学习图像数据集的方法。

bing-images 是一个用于从 Bing.com 获取图像 URL 并下载的 Python 库。 具有以下特点

  • 支持文件类型过滤器。
  • 支持 Bing.com filterui 过滤器。
  • 使用多线程和自定义线程池大小下载。
  • 支持纯粹获取图像 URL。

Demo

创建一个叫 image-collector 的项目。

安装 bing-images

前提

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
❯ pip install bing-images
Collecting bing-images
  Downloading bing_images-0.0.6-py3-none-any.whl (6.7 kB)
Collecting requests>=2.24.0
  Using cached requests-2.25.1-py2.py3-none-any.whl (61 kB)
Collecting selenium>=3.141.0
  Using cached selenium-3.141.0-py2.py3-none-any.whl (904 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.3-py2.py3-none-any.whl (137 kB)
Requirement already satisfied: certifi>=2017.4.17 in /Users/catchzeng/miniconda3/envs/test/lib/python3.8/site-packages (from requests>=2.24.0->bing-images) (2020.12.5)
Collecting idna<3,>=2.5
  Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting chardet<5,>=3.0.2
  Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Installing collected packages: urllib3, idna, chardet, selenium, requests, bing-images
Successfully installed bing-images-0.1.0 chardet-4.0.0 idna-2.10 requests-2.25.1 selenium-3.141.0 urllib3-1.26.3

获取图片 URLs

fetch_image_urls.py

1
2
3
4
5
6
7
8
from bing_images import bing

urls = bing.fetch_image_urls("cat", limit=10, file_type='png', filters='+filterui:aspect-square+filterui:color2-bw')
print("{} images.".format(len(urls)))
counter = 1
for url in urls:
    print("{}: {}".format(counter, url))
    counter += 1

运行

1
2
3
4
5
6
7
8
9
10
11
12
❯ python fetch_image_urls.py
10 images.
1: http://pngimg.com/uploads/cat/cat_PNG50521.png
2: http://pngimg.com/uploads/cat/cat_PNG1616.png
3: https://pngimg.com/uploads/cat/cat_PNG50532.png
4: https://pngimg.com/uploads/cat/cat_PNG1621.png
5: https://pngimg.com/uploads/cat/cat_PNG1618.png
6: http://pngimg.com/uploads/cat/cat_PNG1624.png
7: http://www.pngmart.com/files/5/Black-Cat-PNG-Transparent.png
8: http://www.myiconfinder.com/uploads/iconsets/256-256-a96249f4c8a9753fd904f8be023dc25c-cat.png
9: https://pngimg.com/uploads/cat/cat_PNG1619.png
10: http://pngimg.com/uploads/cat/cat_PNG50521.png

多线程下载

download.py

1
2
3
4
5
6
7
8
from bing_images import bing

bing.download_images("cat",
                      20,
                      output_dir="/Users/catchzeng/Desktop/cat",
                      pool_size=10,
                      file_type="png",
                      force_replace=True)

运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
❯ python download.py
Save path: /Users/catchzeng/Desktop/cat
Downloading images
#1 http://pngimg.com/uploads/cat/cat_PNG50509.png Downloaded
#2 https://pngimg.com/uploads/cat/cat_PNG50498.png Downloaded
#3 http://www.freepngimg.com/download/cat/22193-3-adorable-cat.png Downloaded
#4 http://pngimg.com/uploads/cat/cat_PNG106.png Downloaded
#5 https://pngimg.com/uploads/cat/cat_PNG50465.png Downloaded
#6 https://pngimg.com/uploads/cat/cat_PNG50417.png Downloaded
#7 https://pngimg.com/uploads/cat/cat_PNG50480.png Downloaded
#8 http://pngimg.com/uploads/cat/cat_PNG119.png Downloaded
#9 https://pngimg.com/uploads/cat/cat_PNG50438.png Downloaded
#10 http://pngimg.com/uploads/cat/cat_PNG100.png Downloaded
#11 https://pngimg.com/uploads/cat/cat_PNG50447.png Downloaded
#12 https://pngimg.com/uploads/cat/cat_PNG50440.png Downloaded
#13 https://pngimg.com/uploads/cat/cat_PNG50433.png Downloaded
#14 https://www.pngarts.com/files/1/Baby-Cat-PNG-Free-Download.png Downloaded
#15 https://cdn.pixabay.com/photo/2017/02/22/16/55/cat-2089916_960_720.png Downloaded
#16 https://pngimg.com/uploads/cat/cat_PNG50434.png Downloaded
#17 http://pngimg.com/uploads/cat/cat_PNG50529.png Downloaded
#18 http://pngimg.com/uploads/cat/cat_PNG113.png Downloaded
#19 https://purepng.com/public/uploads/large/purepng.com-catanimalscat-981524673949tj5ns.png Downloaded
#20 https://pngimg.com/uploads/cat/cat_PNG50435.png Downloaded
Renaming images
Finished renaming
Done
Elapsed time: 20.76s

下载方形黑白图

download-square.py

1
2
3
4
5
6
7
8
9
from bing_images import bing

bing.download_images("cat",
                      20,
                      output_dir="/Users/catchzeng/Desktop/cat",
                      pool_size=20,
                      file_type="png",
                      filters='+filterui:aspect-square+filterui:color2-bw',
                      force_replace=True)

详细的代码,请见 https://github.com/CatchZeng/bing_images,再见!


CatchZeng
Written by CatchZeng Follow
AI (Machine Learning) and DevOps enthusiast.