[AWS] SageMaker에서 PyTorch를 활용한 기계 학습 환경 구축하기
AI(인공 지능)의 발전과 적용 분야의 다양화로 인해 기계 학습에 대한 수요가 높아지고 있는데요. 기계 학습 환경을 구축하기 위해서는 여러가지 요소들이 필요합니다.
AWS에서 자체 제공하는 기계 학습 플랫폼인 SageMaker를 사용하여 기계 학습 환경을 구현해보도록 하겠습니다.
Amazon SageMaker에 대한 기본적인 정보는 이전 포스트를 참고하시기 바랍니다.
기계 학습 환경
구현해보고자하는 기계 학습 환경 정보입니다.
다양한 기계 학습 라이브러리, 학습 모델, 데이터셋 등을 사용하실 수 있습니다.
기계 학습 라이브러리
- 기계 학습 라이브러리는 오픈소스 머신 러닝 라이브러리인 PyTorch를 사용하여 기계 학습 환경을 구현해보도록 하겠습니다.
기계 학습 모델
- 기계 학습 모델은 Object Detection 분야에서 많이 알려진 YOLOv5모델을 사용하도록 하겠습니다.
기계 학습 데이터
- 기계 학습 데이터는 COCO dataset을 사용하도록 하겠습니다.
개발 환경
- 개발 환경은 기계 학습을 위한 완전한 IDE(통합 개발 환경)을 제공하는 Amazon SageMaker Studio를 사용하도록 하겠습니다.
- Studio에서 학습 코드와 학습 데이터를 구성하고 바로 기계 학습을 할 수 있도록 구성해보겠습니다.
개발 환경 설정
개발 환경 설정을 위해 Amazon SageMaker Studio를 사용합니다.
Studio에서는 Jupyter Notebook 파일(ipynb)로 기계 학습 코드를 실행하거나 Terminal을 통해 개발 환경을 구성할 수 있습니다. 각각의 요소들을 하나씩 구현해보도록 하겠습니다.
기계 학습 모델 다운로드
Terminal을 실행하여 기계 학습 모델은 YOLOv5를 다운로드 받습니다.
git clone 명령어를 통해 github 프로젝트를 다운로드 받습니다.
git clone https://github.com/ultralytics/yolov5.git
다운로드 완료 시 기계 학습 모델 코드가 정상적으로 다운로드 되었는지 확인하실 수 있습니다.
Jupyter Notebook 실행
Studio에서 Jupyter Notebook을 실행하기 위해 Notebook을 추가합니다.
Image는 PyTorch 라이브러리를 사용하며 기계 학습을 실행할 수 있도록 PyTorch 2.0.0 Python 3.10 CPU Optimized
를 선택하였습니다.
Instance type을 GPU Instance를 사용 시 GPU Optimized
를 선택해주시면 되고, 그렇지 않다면 원하는 타입을 선택 후 진행해주시면 됩니다.
생성된 Notebook은 오른쪽 클릭 후 이름을 변경하였습니다.
왼쪽 메뉴에 있는 Running Terminals and Kernels
버튼을 클릭하여 현재 Studio에서 실행되고 있는 리소스들을 확인하실 수 있습니다.
기계 학습 코드 작성
다운로드 받은 기계 학습 모델을 통해 Notebook에서 python 코드를 바로 실행하면 기계 학습을 실행할 수 있습니다.
!python train.py --data coco128.yaml --cfg yolov5s.yaml --epochs 3 --batch-size 8 --weights ''
하지만 SageMaker에서 제공하는 SDK를 통해 기계 학습 환경을 구성할 것이므로, SageMaker의 SDK를 사용해보도록 합시다.
기계 학습 코드 - 기본 설정
기계 학습을 위해 기본 SageMaker 관련 설정과 Estimator를 설정해야됩니다.
Amazon SageMaker의 Estimator는 기계 학습 모델을 학습 및 배포하기 위한 고수준 추상화 인터페이스입니다.
Estimator를 사용하면 다양한 기계 학습 작업을 쉽게 관리하고 실행할 수 있습니다.
import sagemaker
#========== 기본설정 ==========
# Session 생성
sagemaker_session = sagemaker.Session()
# IAM role
role = sagemaker.get_execution_role()
#========== 학습 설정 ==========
# Estimator 설정
est_pytorch_entry_point= 'train.py'
est_pytorch_image_uri = '{ECR_URL}/sagemaker:pytorch-training-2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker'
est_pytorch_framework_version = '2.0.0'
est_pytorch_py_version = 'py310'
est_pytorch_instance_type = 'ml.g4dn.xlarge'
est_pytorch_instance_count = 1
est_pytorch_source_dir = './'
# Hyperparameters 설정
est_pytorch_hyperparameters={
'data': 'coco128.yaml',
'cfg': 'yolov5s.yaml',
'epochs': 3,
'project' : '/opt/ml/model',
'batch-size': 8
}
- YOLOv5 모델 프로젝트의 기본 학습 코드 인 train.py 파일을 사용합니다.
- 기계 학습 시 사용할 PyTorch, Python 버전과 컨테이너 이미지를 지정합니다.
- 별도의 컨테이너 이미지를 지정하지 않아도 SageMaker에서 자체 제공하는 이미지를 통해 기계 학습이 실행됩니다.
- 기계 학습 시 사용할 데이터 셋 및 옵션을 선택합니다.
- coco128.yml 데이터셋을 지정하고, 모델은 YOLOv5s를 선택하였습니다.
- epochs 값은 3, batch-size는 8로 설정하였으며 SageMaker 기본 모델 저장 경로를 지정하기 위해 project 값은 /opt/ml/model로 설정하였습니다.
기계 학습 코드 - Estimator 생성
기계 학습을 실행하기 위한 Estimator를 생성 작업을 진행합니다.
사전에 설정한 정보를 통해 Estimator를 생성합니다.
from sagemaker.pytorch import PyTorch
from sagemaker.inputs import TrainingInput
# Estimator 생성
pytorch_estimator = PyTorch(
entry_point=est_pytorch_entry_point,
image_uri=est_pytorch_image_uri,
framework_version=est_pytorch_framework_version,
py_version=est_pytorch_py_version,
instance_type=est_pytorch_instance_type,
instance_count=est_pytorch_instance_count,
source_dir=est_pytorch_source_dir,
sagemaker_session=sagemaker_session,
role=role,
hyperparameters=est_pytorch_hyperparameters
)
기계 학습 코드 - 학습 시작
pytorch_estimator
이름으로 생성한 Estimator를 실행합니다.
.fit()
함수를 통해 학습을 시작합니다.
# 학습 시작
pytorch_estimator.fit()
기계 학습 코드 - 모델 저장 경로 확인
기계 학습 완료 후 아래 명령어를 통해 저장된 모델 파일 및 경로를 확인할 수 있습니다.
# 학습된 모델 아티팩트의 S3 경로
model_data = pytorch_estimator.model_data
# S3에 저장된 weights 파일 확인
print("Model artifacts saved at:", model_data)
기계 학습 코드 실행
작성한 기계 학습 코드를 실행해보도록 하겠습니다.
Estimator의 fit 함수를 실행합니다.
기계 학습 완료 시 생성된 모델 경로와 성공 여부, 학습 시간 등을 확인하실 수 있습니다.
전체 학습 상세 로그는 아래 버튼을 클릭하여 확인하시기 바랍니다.
학습 상세 로그
Using provided s3_resource
INFO:sagemaker:Creating training-job with name: sagemaker-2023-06-30-08-49-38-688
2023-06-30 08:49:43 Starting - Starting the training job...
2023-06-30 08:50:00 Starting - Preparing the instances for training......
2023-06-30 08:51:00 Downloading - Downloading input data...
2023-06-30 08:51:20 Training - Downloading the training image.................................
2023-06-30 08:56:55 Training - Training image download completed. Training in progress.bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2023-06-30 08:57:05,915 sagemaker-training-toolkit INFO Imported framework sagemaker_pytorch_container.training
2023-06-30 08:57:05,930 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)
2023-06-30 08:57:05,939 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.
2023-06-30 08:57:05,944 sagemaker_pytorch_container.training INFO Invoking user training script.
2023-06-30 08:57:08,222 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:
/opt/conda/bin/python3.10 -m pip install -r requirements.txt
Collecting gitpython>=3.1.30 (from -r requirements.txt (line 5))
Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 184.3/184.3 kB 13.1 MB/s eta 0:00:00
Requirement already satisfied: matplotlib>=3.3 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 6)) (3.7.1)
Requirement already satisfied: numpy>=1.18.5 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 7)) (1.23.5)
Requirement already satisfied: opencv-python>=4.1.1 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 8)) (4.7.0)
Requirement already satisfied: Pillow>=7.1.2 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 9)) (9.4.0)
Requirement already satisfied: psutil in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 10)) (5.9.5)
Requirement already satisfied: PyYAML>=5.3.1 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 11)) (5.4.1)
Requirement already satisfied: requests>=2.23.0 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 12)) (2.28.2)
Requirement already satisfied: scipy>=1.4.1 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 13)) (1.10.1)
Collecting thop>=0.1.1 (from -r requirements.txt (line 14))
Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Requirement already satisfied: torch>=1.7.0 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 15)) (2.0.0)
Requirement already satisfied: torchvision>=0.8.1 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 16)) (0.15.1)
Requirement already satisfied: tqdm>=4.64.0 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 17)) (4.65.0)
Collecting ultralytics>=8.0.111 (from -r requirements.txt (line 18))
Downloading ultralytics-8.0.124-py3-none-any.whl (612 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 612.6/612.6 kB 35.5 MB/s eta 0:00:00
Requirement already satisfied: pandas>=1.1.4 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 27)) (2.0.1)
Requirement already satisfied: seaborn>=0.11.0 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 28)) (0.12.2)
Requirement already satisfied: setuptools>=65.5.1 in /opt/conda/lib/python3.10/site-packages (from -r requirements.txt (line 42)) (65.6.3)
Collecting gitdb<5,>=4.0.1 (from gitpython>=3.1.30->-r requirements.txt (line 5))
Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 kB 17.3 MB/s eta 0:00:00
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (4.39.4)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (1.4.4)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (23.1)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.10/site-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (2.8.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 12)) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 12)) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 12)) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 12)) (2023.5.7)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 15)) (3.12.0)
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 15)) (4.5.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 15)) (1.11.1)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 15)) (3.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 15)) (3.1.2)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas>=1.1.4->-r requirements.txt (line 27)) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas>=1.1.4->-r requirements.txt (line 27)) (2023.3)
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython>=3.1.30->-r requirements.txt (line 5))
Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib>=3.3->-r requirements.txt (line 6)) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch>=1.7.0->-r requirements.txt (line 15)) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch>=1.7.0->-r requirements.txt (line 15)) (1.3.0)
Installing collected packages: smmap, gitdb, thop, gitpython, ultralytics
Successfully installed gitdb-4.0.10 gitpython-3.1.31 smmap-5.0.0 thop-0.1.1.post2209072238 ultralytics-8.0.124
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
2023-06-30 08:57:10,933 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.
2023-06-30 08:57:10,933 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.
2023-06-30 08:57:10,950 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)
2023-06-30 08:57:10,975 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)
2023-06-30 08:57:10,999 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)
2023-06-30 08:57:11,008 sagemaker-training-toolkit INFO Invoking user script
Training Env:
{
"additional_framework_parameters": {},
"channel_input_dirs": {},
"current_host": "algo-1",
"current_instance_group": "homogeneousCluster",
"current_instance_group_hosts": [
"algo-1"
],
"current_instance_type": "ml.g4dn.xlarge",
"distribution_hosts": [],
"distribution_instance_groups": [],
"framework_module": "sagemaker_pytorch_container.training:main",
"hosts": [
"algo-1"
],
"hyperparameters": {
"batch-size": 8,
"cfg": "yolov5s.yaml",
"data": "coco128.yaml",
"epochs": 3,
"project": "/opt/ml/model"
},
"input_config_dir": "/opt/ml/input/config",
"input_data_config": {},
"input_dir": "/opt/ml/input",
"instance_groups": [
"homogeneousCluster"
],
"instance_groups_dict": {
"homogeneousCluster": {
"instance_group_name": "homogeneousCluster",
"instance_type": "ml.g4dn.xlarge",
"hosts": [
"algo-1"
]
}
},
"is_hetero": false,
"is_master": true,
"is_modelparallel_enabled": null,
"is_smddpmprun_installed": true,
"job_name": "sagemaker-2023-06-30-08-49-38-688",
"log_level": 20,
"master_hostname": "algo-1",
"model_dir": "/opt/ml/model",
"module_dir": "s3://####################/sagemaker-2023-06-30-08-49-38-688/source/sourcedir.tar.gz",
"module_name": "train",
"network_interface_name": "eth0",
"num_cpus": 4,
"num_gpus": 1,
"num_neurons": 0,
"output_data_dir": "/opt/ml/output/data",
"output_dir": "/opt/ml/output",
"output_intermediate_dir": "/opt/ml/output/intermediate",
"resource_config": {
"current_host": "algo-1",
"current_instance_type": "ml.g4dn.xlarge",
"current_group_name": "homogeneousCluster",
"hosts": [
"algo-1"
],
"instance_groups": [
{
"instance_group_name": "homogeneousCluster",
"instance_type": "ml.g4dn.xlarge",
"hosts": [
"algo-1"
]
}
],
"network_interface_name": "eth0"
},
"user_entry_point": "train.py"
}
Environment variables:
SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"batch-size":8,"cfg":"yolov5s.yaml","data":"coco128.yaml","epochs":3,"project":"/opt/ml/model"}
SM_USER_ENTRY_POINT=train.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_group_name":"homogeneousCluster","current_host":"algo-1","current_instance_type":"ml.g4dn.xlarge","hosts":["algo-1"],"instance_groups":[{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.g4dn.xlarge"}],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=[]
SM_CURRENT_HOST=algo-1
SM_CURRENT_INSTANCE_TYPE=ml.g4dn.xlarge
SM_CURRENT_INSTANCE_GROUP=homogeneousCluster
SM_CURRENT_INSTANCE_GROUP_HOSTS=["algo-1"]
SM_INSTANCE_GROUPS=["homogeneousCluster"]
SM_INSTANCE_GROUPS_DICT={"homogeneousCluster":{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.g4dn.xlarge"}}
SM_DISTRIBUTION_INSTANCE_GROUPS=[]
SM_IS_HETERO=false
SM_MODULE_NAME=train
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=4
SM_NUM_GPUS=1
SM_NUM_NEURONS=0
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://####################/sagemaker-2023-06-30-08-49-38-688/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{},"current_host":"algo-1","current_instance_group":"homogeneousCluster","current_instance_group_hosts":["algo-1"],"current_instance_type":"ml.g4dn.xlarge","distribution_hosts":[],"distribution_instance_groups":[],"framework_module":"sagemaker_pytorch_container.training:main","hosts":["algo-1"],"hyperparameters":{"batch-size":8,"cfg":"yolov5s.yaml","data":"coco128.yaml","epochs":3,"project":"/opt/ml/model"},"input_config_dir":"/opt/ml/input/config","input_data_config":{},"input_dir":"/opt/ml/input","instance_groups":["homogeneousCluster"],"instance_groups_dict":{"homogeneousCluster":{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.g4dn.xlarge"}},"is_hetero":false,"is_master":true,"is_modelparallel_enabled":null,"is_smddpmprun_installed":true,"job_name":"sagemaker-2023-06-30-08-49-38-688","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://####################/sagemaker-2023-06-30-08-49-38-688/source/sourcedir.tar.gz","module_name":"train","network_interface_name":"eth0","num_cpus":4,"num_gpus":1,"num_neurons":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_group_name":"homogeneousCluster","current_host":"algo-1","current_instance_type":"ml.g4dn.xlarge","hosts":["algo-1"],"instance_groups":[{"hosts":["algo-1"],"instance_group_name":"homogeneousCluster","instance_type":"ml.g4dn.xlarge"}],"network_interface_name":"eth0"},"user_entry_point":"train.py"}
SM_USER_ARGS=["--batch-size","8","--cfg","yolov5s.yaml","--data","coco128.yaml","--epochs","3","--project","/opt/ml/model"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_HP_BATCH-SIZE=8
SM_HP_CFG=yolov5s.yaml
SM_HP_DATA=coco128.yaml
SM_HP_EPOCHS=3
SM_HP_PROJECT=/opt/ml/model
PYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python310.zip:/opt/conda/lib/python3.10:/opt/conda/lib/python3.10/lib-dynload:/opt/conda/lib/python3.10/site-packages
Invoking script with the following command:
/opt/conda/bin/python3.10 train.py --batch-size 8 --cfg yolov5s.yaml --data coco128.yaml --epochs 3 --project /opt/ml/model
2023-06-30 08:57:11,040 sagemaker-training-toolkit INFO Exceptions not imported for SageMaker TF as Tensorflow is not installed.
#033[34m#033[1mtrain: #033[0mweights=yolov5s.pt, cfg=yolov5s.yaml, data=coco128.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=3, batch_size=8, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=/opt/ml/model, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
#033[34m#033[1mgithub: #033[0mup to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-187-g0004c74 Python-3.10.8 torch-2.0.0 CUDA:0 (Tesla T4, 15102MiB)
#033[34m#033[1mhyperparameters: #033[0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
#033[34m#033[1mComet: #033[0mrun 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
#033[34m#033[1mTensorBoard: #033[0mStart with 'tensorboard --logdir /opt/ml/model', view at http://localhost:6006/
Dataset not found ⚠️, missing paths ['/opt/ml/datasets/coco128/images/train2017']
Downloading https://ultralytics.com/assets/coco128.zip to coco128.zip...
0%| | 0.00/6.66M [00:00<?, ?B/s]
100%|██████████| 6.66M/6.66M [00:00<00:00, 258MB/s]
Dataset download success ✅ (0.8s), saved to #033[1m/opt/ml/datasets#033[0m
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
0%| | 0.00/755k [00:00<?, ?B/s]
100%|██████████| 755k/755k [00:00<00:00, 43.8MB/s]
Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
0%| | 0.00/14.1M [00:00<?, ?B/s]
4%|▍ | 560k/14.1M [00:00<00:02, 5.63MB/s]
8%|▊ | 1.09M/14.1M [00:00<00:02, 5.60MB/s]
12%|█▏ | 1.62M/14.1M [00:00<00:02, 5.61MB/s]
17%|█▋ | 2.38M/14.1M [00:00<00:01, 6.49MB/s]
27%|██▋ | 3.80M/14.1M [00:00<00:01, 9.51MB/s]
37%|███▋ | 5.18M/14.1M [00:00<00:00, 10.8MB/s]
54%|█████▍ | 7.69M/14.1M [00:00<00:00, 15.7MB/s]
67%|██████▋ | 9.48M/14.1M [00:00<00:00, 16.6MB/s]
84%|████████▍ | 11.8M/14.1M [00:00<00:00, 19.1MB/s]
97%|█████████▋| 13.7M/14.1M [00:01<00:00, 18.8MB/s]
100%|██████████| 14.1M/14.1M [00:01<00:00, 14.1MB/s]
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 214 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs
Transferred 348/349 items from yolov5s.pt
#033[34m#033[1mAMP: #033[0mchecks passed ✅
#033[34m#033[1moptimizer:#033[0m SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 bias
#033[34m#033[1mtrain: #033[0mScanning /opt/ml/datasets/coco128/labels/train2017...: 0%| | 0/128 [00:00<?, ?it/s]
#033[34m#033[1mtrain: #033[0mScanning /opt/ml/datasets/coco128/labels/train2017... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<00:00, 3075.76it/s]
#033[34m#033[1mtrain: #033[0mNew cache created: /opt/ml/datasets/coco128/labels/train2017.cache
#033[34m#033[1mval: #033[0mScanning /opt/ml/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
#033[34m#033[1mval: #033[0mScanning /opt/ml/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]
#033[34m#033[1mAutoAnchor: #033[0m4.27 anchors/target, 0.994 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅
Plotting labels to /opt/ml/model/exp/labels.jpg...
Image sizes 640 train, 640 val
Using 4 dataloader workers
Logging results to #033[1m/opt/ml/model/exp#033[0m
Starting training for 3 epochs...
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
0%| | 0/16 [00:00<?, ?it/s]
0/2 1.72G 0.04263 0.04785 0.02002 84 640: 0%| | 0/16 [00:00<?, ?it/s]
0/2 1.72G 0.04263 0.04785 0.02002 84 640: 6%|▋ | 1/16 [00:00<00:09, 1.65it/s]
0/2 1.86G 0.04137 0.0532 0.01919 77 640: 6%|▋ | 1/16 [00:00<00:09, 1.65it/s]
0/2 1.86G 0.04137 0.0532 0.01919 77 640: 12%|█▎ | 2/16 [00:00<00:04, 2.91it/s]
0/2 1.86G 0.04399 0.05919 0.01962 86 640: 12%|█▎ | 2/16 [00:00<00:04, 2.91it/s]
0/2 1.86G 0.04399 0.05919 0.01962 86 640: 19%|█▉ | 3/16 [00:00<00:03, 3.75it/s]
0/2 1.86G 0.0457 0.06293 0.01933 123 640: 19%|█▉ | 3/16 [00:01<00:03, 3.75it/s]
0/2 1.86G 0.0457 0.06293 0.01933 123 640: 25%|██▌ | 4/16 [00:01<00:02, 4.25it/s]
0/2 1.86G 0.04631 0.06611 0.01771 115 640: 25%|██▌ | 4/16 [00:01<00:02, 4.25it/s]
0/2 1.86G 0.04631 0.06611 0.01771 115 640: 31%|███▏ | 5/16 [00:01<00:02, 5.16it/s]
0/2 1.86G 0.04695 0.06591 0.01847 114 640: 31%|███▏ | 5/16 [00:01<00:02, 5.16it/s]
0/2 1.86G 0.04695 0.06591 0.01847 114 640: 38%|███▊ | 6/16 [00:01<00:01, 6.04it/s]
0/2 1.86G 0.04622 0.06352 0.01832 78 640: 38%|███▊ | 6/16 [00:01<00:01, 6.04it/s]
0/2 1.86G 0.04622 0.06352 0.01832 78 640: 44%|████▍ | 7/16 [00:01<00:01, 6.84it/s]
0/2 1.86G 0.04573 0.06442 0.018 118 640: 44%|████▍ | 7/16 [00:01<00:01, 6.84it/s]
0/2 1.86G 0.04573 0.06442 0.018 118 640: 50%|█████ | 8/16 [00:01<00:01, 7.43it/s]
0/2 1.86G 0.04537 0.06323 0.01781 76 640: 50%|█████ | 8/16 [00:01<00:01, 7.43it/s]
0/2 1.86G 0.04574 0.06244 0.01824 91 640: 50%|█████ | 8/16 [00:01<00:01, 7.43it/s]
0/2 1.86G 0.04574 0.06244 0.01824 91 640: 62%|██████▎ | 10/16 [00:01<00:00, 8.43it/s]
0/2 1.86G 0.04525 0.06201 0.01792 96 640: 62%|██████▎ | 10/16 [00:01<00:00, 8.43it/s]
0/2 1.86G 0.04521 0.0621 0.01862 103 640: 62%|██████▎ | 10/16 [00:01<00:00, 8.43it/s]
0/2 1.86G 0.04521 0.0621 0.01862 103 640: 75%|███████▌ | 12/16 [00:01<00:00, 8.92it/s]
0/2 1.86G 0.04485 0.06149 0.0183 91 640: 75%|███████▌ | 12/16 [00:02<00:00, 8.92it/s]
0/2 1.86G 0.04482 0.06218 0.01863 116 640: 75%|███████▌ | 12/16 [00:02<00:00, 8.92it/s]
0/2 1.86G 0.04482 0.06218 0.01863 116 640: 88%|████████▊ | 14/16 [00:02<00:00, 9.25it/s]
0/2 1.86G 0.04545 0.06109 0.01889 63 640: 88%|████████▊ | 14/16 [00:02<00:00, 9.25it/s]
0/2 1.86G 0.04523 0.06067 0.01926 82 640: 88%|████████▊ | 14/16 [00:02<00:00, 9.25it/s]
0/2 1.86G 0.04523 0.06067 0.01926 82 640: 100%|██████████| 16/16 [00:02<00:00, 9.50it/s]#015 0/2 1.86G 0.04523 0.06067 0.01926 82 640: 100%|██████████| 16/16 [00:02<00:00, 6.71it/s]
Class Images Instances P R mAP50 mAP50-95: 0%| | 0/8 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 12%|█▎ | 1/8 [00:01<00:08, 1.26s/it]
Class Images Instances P R mAP50 mAP50-95: 25%|██▌ | 2/8 [00:01<00:03, 1.67it/s]
Class Images Instances P R mAP50 mAP50-95: 38%|███▊ | 3/8 [00:01<00:02, 2.34it/s]
Class Images Instances P R mAP50 mAP50-95: 50%|█████ | 4/8 [00:01<00:01, 3.15it/s]
Class Images Instances P R mAP50 mAP50-95: 62%|██████▎ | 5/8 [00:01<00:00, 3.84it/s]
Class Images Instances P R mAP50 mAP50-95: 75%|███████▌ | 6/8 [00:02<00:00, 3.89it/s]
Class Images Instances P R mAP50 mAP50-95: 88%|████████▊ | 7/8 [00:02<00:00, 4.17it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:02<00:00, 4.39it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:02<00:00, 3.10it/s]
all 128 929 0.732 0.631 0.718 0.477
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
0%| | 0/16 [00:00<?, ?it/s]
1/2 2.19G 0.05467 0.06597 0.02623 109 640: 0%| | 0/16 [00:00<?, ?it/s]
1/2 2.19G 0.04909 0.0572 0.0236 60 640: 0%| | 0/16 [00:00<?, ?it/s]
1/2 2.19G 0.04909 0.0572 0.0236 60 640: 12%|█▎ | 2/16 [00:00<00:01, 10.13it/s]
1/2 2.19G 0.04965 0.06586 0.02222 137 640: 12%|█▎ | 2/16 [00:00<00:01, 10.13it/s]
1/2 2.19G 0.04856 0.07264 0.02058 143 640: 12%|█▎ | 2/16 [00:00<00:01, 10.13it/s]
1/2 2.19G 0.04856 0.07264 0.02058 143 640: 25%|██▌ | 4/16 [00:00<00:01, 10.04it/s]
1/2 2.19G 0.04746 0.07102 0.01933 91 640: 25%|██▌ | 4/16 [00:00<00:01, 10.04it/s]
1/2 2.19G 0.04695 0.0708 0.01908 94 640: 25%|██▌ | 4/16 [00:00<00:01, 10.04it/s]
1/2 2.19G 0.04695 0.0708 0.01908 94 640: 38%|███▊ | 6/16 [00:00<00:01, 9.85it/s]
1/2 2.19G 0.04689 0.0724 0.01959 135 640: 38%|███▊ | 6/16 [00:00<00:01, 9.85it/s]
1/2 2.19G 0.04749 0.07476 0.01983 159 640: 38%|███▊ | 6/16 [00:00<00:01, 9.85it/s]
1/2 2.19G 0.04749 0.07476 0.01983 159 640: 50%|█████ | 8/16 [00:00<00:00, 10.21it/s]
1/2 2.19G 0.04698 0.07365 0.01919 97 640: 50%|█████ | 8/16 [00:00<00:00, 10.21it/s]
1/2 2.19G 0.0468 0.07334 0.01888 109 640: 50%|█████ | 8/16 [00:00<00:00, 10.21it/s]
1/2 2.19G 0.0468 0.07334 0.01888 109 640: 62%|██████▎ | 10/16 [00:00<00:00, 10.03it/s]
1/2 2.19G 0.04633 0.07145 0.019 88 640: 62%|██████▎ | 10/16 [00:01<00:00, 10.03it/s]
1/2 2.19G 0.04585 0.07 0.019 79 640: 62%|██████▎ | 10/16 [00:01<00:00, 10.03it/s]
1/2 2.19G 0.04585 0.07 0.019 79 640: 75%|███████▌ | 12/16 [00:01<00:00, 10.01it/s]
1/2 2.19G 0.04636 0.06949 0.01866 104 640: 75%|███████▌ | 12/16 [00:01<00:00, 10.01it/s]
1/2 2.19G 0.04635 0.06968 0.01835 120 640: 75%|███████▌ | 12/16 [00:01<00:00, 10.01it/s]
1/2 2.19G 0.04635 0.06968 0.01835 120 640: 88%|████████▊ | 14/16 [00:01<00:00, 10.25it/s]
1/2 2.19G 0.04637 0.06777 0.01839 76 640: 88%|████████▊ | 14/16 [00:01<00:00, 10.25it/s]
1/2 2.19G 0.04622 0.06747 0.01825 116 640: 88%|████████▊ | 14/16 [00:01<00:00, 10.25it/s]
1/2 2.19G 0.04622 0.06747 0.01825 116 640: 100%|██████████| 16/16 [00:01<00:00, 8.37it/s]#015 1/2 2.19G 0.04622 0.06747 0.01825 116 640: 100%|██████████| 16/16 [00:01<00:00, 9.33it/s]
Class Images Instances P R mAP50 mAP50-95: 0%| | 0/8 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 12%|█▎ | 1/8 [00:00<00:00, 7.10it/s]
Class Images Instances P R mAP50 mAP50-95: 25%|██▌ | 2/8 [00:00<00:00, 7.13it/s]
Class Images Instances P R mAP50 mAP50-95: 38%|███▊ | 3/8 [00:00<00:00, 6.68it/s]
Class Images Instances P R mAP50 mAP50-95: 50%|█████ | 4/8 [00:00<00:00, 6.66it/s]
Class Images Instances P R mAP50 mAP50-95: 62%|██████▎ | 5/8 [00:00<00:00, 6.84it/s]
Class Images Instances P R mAP50 mAP50-95: 75%|███████▌ | 6/8 [00:00<00:00, 6.74it/s]
Class Images Instances P R mAP50 mAP50-95: 88%|████████▊ | 7/8 [00:01<00:00, 6.84it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:01<00:00, 6.61it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:01<00:00, 6.73it/s]
all 128 929 0.785 0.629 0.738 0.498
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
0%| | 0/16 [00:00<?, ?it/s]
2/2 2.2G 0.04802 0.1227 0.01417 195 640: 0%| | 0/16 [00:00<?, ?it/s]
2/2 2.2G 0.04561 0.08766 0.01594 101 640: 0%| | 0/16 [00:00<?, ?it/s]
2/2 2.2G 0.04561 0.08766 0.01594 101 640: 12%|█▎ | 2/16 [00:00<00:01, 9.72it/s]
2/2 2.2G 0.04542 0.07617 0.01757 86 640: 12%|█▎ | 2/16 [00:00<00:01, 9.72it/s]
2/2 2.2G 0.04423 0.07265 0.01708 98 640: 12%|█▎ | 2/16 [00:00<00:01, 9.72it/s]
2/2 2.2G 0.04423 0.07265 0.01708 98 640: 25%|██▌ | 4/16 [00:00<00:01, 10.23it/s]
2/2 2.2G 0.04398 0.07078 0.01723 111 640: 25%|██▌ | 4/16 [00:00<00:01, 10.23it/s]
2/2 2.2G 0.04369 0.07246 0.01751 113 640: 25%|██▌ | 4/16 [00:00<00:01, 10.23it/s]
2/2 2.2G 0.04369 0.07246 0.01751 113 640: 38%|███▊ | 6/16 [00:00<00:00, 10.09it/s]
2/2 2.2G 0.04346 0.06956 0.018 102 640: 38%|███▊ | 6/16 [00:00<00:00, 10.09it/s]
2/2 2.2G 0.04342 0.06977 0.01773 118 640: 38%|███▊ | 6/16 [00:00<00:00, 10.09it/s]
2/2 2.2G 0.04342 0.06977 0.01773 118 640: 50%|█████ | 8/16 [00:00<00:00, 10.27it/s]
2/2 2.2G 0.04328 0.06999 0.01823 102 640: 50%|█████ | 8/16 [00:00<00:00, 10.27it/s]
2/2 2.2G 0.04326 0.06942 0.01802 113 640: 50%|█████ | 8/16 [00:00<00:00, 10.27it/s]
2/2 2.2G 0.04326 0.06942 0.01802 113 640: 62%|██████▎ | 10/16 [00:00<00:00, 10.14it/s]
2/2 2.2G 0.04364 0.06832 0.01798 118 640: 62%|██████▎ | 10/16 [00:01<00:00, 10.14it/s]
2/2 2.2G 0.04363 0.06963 0.01759 140 640: 62%|██████▎ | 10/16 [00:01<00:00, 10.14it/s]
2/2 2.2G 0.04363 0.06963 0.01759 140 640: 75%|███████▌ | 12/16 [00:01<00:00, 10.36it/s]
2/2 2.2G 0.04389 0.06915 0.0175 107 640: 75%|███████▌ | 12/16 [00:01<00:00, 10.36it/s]
2/2 2.2G 0.04385 0.0688 0.01785 98 640: 75%|███████▌ | 12/16 [00:01<00:00, 10.36it/s]
2/2 2.2G 0.04385 0.0688 0.01785 98 640: 88%|████████▊ | 14/16 [00:01<00:00, 9.54it/s]
2/2 2.2G 0.04383 0.06737 0.01852 62 640: 88%|████████▊ | 14/16 [00:01<00:00, 9.54it/s]
2/2 2.2G 0.04365 0.06745 0.01911 96 640: 88%|████████▊ | 14/16 [00:01<00:00, 9.54it/s]
2/2 2.2G 0.04365 0.06745 0.01911 96 640: 100%|██████████| 16/16 [00:01<00:00, 8.02it/s]#015 2/2 2.2G 0.04365 0.06745 0.01911 96 640: 100%|██████████| 16/16 [00:01<00:00, 9.14it/s]
Class Images Instances P R mAP50 mAP50-95: 0%| | 0/8 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 12%|█▎ | 1/8 [00:00<00:00, 7.91it/s]
Class Images Instances P R mAP50 mAP50-95: 25%|██▌ | 2/8 [00:00<00:00, 7.70it/s]
Class Images Instances P R mAP50 mAP50-95: 38%|███▊ | 3/8 [00:00<00:00, 7.29it/s]
Class Images Instances P R mAP50 mAP50-95: 50%|█████ | 4/8 [00:00<00:00, 7.22it/s]
Class Images Instances P R mAP50 mAP50-95: 62%|██████▎ | 5/8 [00:00<00:00, 7.08it/s]
Class Images Instances P R mAP50 mAP50-95: 75%|███████▌ | 6/8 [00:00<00:00, 6.85it/s]
Class Images Instances P R mAP50 mAP50-95: 88%|████████▊ | 7/8 [00:01<00:00, 6.53it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:01<00:00, 6.88it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:01<00:00, 7.00it/s]
all 128 929 0.819 0.627 0.747 0.504
3 epochs completed in 0.004 hours.
Optimizer stripped from /opt/ml/model/exp/weights/last.pt, 14.8MB
Optimizer stripped from /opt/ml/model/exp/weights/best.pt, 14.8MB
Validating /opt/ml/model/exp/weights/best.pt...
Fusing layers...
YOLOv5s summary: 157 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Class Images Instances P R mAP50 mAP50-95: 0%| | 0/8 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 12%|█▎ | 1/8 [00:00<00:01, 5.90it/s]
Class Images Instances P R mAP50 mAP50-95: 25%|██▌ | 2/8 [00:00<00:01, 3.32it/s]
Class Images Instances P R mAP50 mAP50-95: 38%|███▊ | 3/8 [00:01<00:01, 2.74it/s]
Class Images Instances P R mAP50 mAP50-95: 50%|█████ | 4/8 [00:01<00:01, 2.66it/s]
Class Images Instances P R mAP50 mAP50-95: 62%|██████▎ | 5/8 [00:01<00:00, 3.31it/s]
Class Images Instances P R mAP50 mAP50-95: 75%|███████▌ | 6/8 [00:01<00:00, 3.88it/s]
Class Images Instances P R mAP50 mAP50-95: 88%|████████▊ | 7/8 [00:01<00:00, 4.44it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:02<00:00, 4.97it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 8/8 [00:02<00:00, 3.91it/s]
all 128 929 0.819 0.627 0.746 0.503
person 128 254 0.9 0.669 0.807 0.523
bicycle 128 6 1 0.331 0.711 0.427
car 128 46 0.859 0.413 0.579 0.243
motorcycle 128 5 0.682 0.8 0.798 0.663
airplane 128 6 0.979 1 0.995 0.755
bus 128 7 0.748 0.714 0.779 0.659
train 128 3 1 0.659 0.863 0.532
truck 128 12 0.664 0.333 0.501 0.242
boat 128 6 1 0.301 0.519 0.252
traffic light 128 14 0.607 0.214 0.377 0.222
stop sign 128 2 0.792 1 0.995 0.772
bench 128 9 0.823 0.52 0.717 0.311
bird 128 16 0.885 1 0.991 0.651
cat 128 4 0.918 1 0.995 0.77
dog 128 9 1 0.653 0.901 0.693
horse 128 2 0.89 1 0.995 0.622
elephant 128 17 0.972 0.882 0.939 0.651
bear 128 1 0.71 1 0.995 0.995
zebra 128 4 0.885 1 0.995 0.93
giraffe 128 9 0.877 0.791 0.962 0.756
backpack 128 6 1 0.562 0.836 0.428
umbrella 128 18 0.79 0.629 0.877 0.529
handbag 128 19 0.959 0.158 0.337 0.188
tie 128 7 0.817 0.645 0.777 0.486
suitcase 128 4 0.866 1 0.995 0.601
frisbee 128 5 0.735 0.8 0.8 0.627
skis 128 1 0.827 1 0.995 0.497
snowboard 128 7 0.85 0.714 0.867 0.544
sports ball 128 6 0.685 0.667 0.668 0.346
kite 128 10 0.835 0.508 0.628 0.22
baseball bat 128 4 0.521 0.25 0.388 0.171
baseball glove 128 7 0.763 0.429 0.47 0.276
skateboard 128 5 0.725 0.54 0.648 0.479
tennis racket 128 7 0.801 0.429 0.559 0.344
bottle 128 18 0.588 0.333 0.601 0.31
wine glass 128 16 0.735 0.875 0.89 0.448
cup 128 36 0.902 0.667 0.839 0.52
fork 128 6 1 0.311 0.439 0.302
knife 128 16 0.829 0.625 0.704 0.393
spoon 128 22 0.828 0.439 0.637 0.363
bowl 128 28 1 0.63 0.763 0.573
banana 128 1 0.809 1 0.995 0.302
sandwich 128 2 1 0 0.448 0.383
orange 128 4 0.829 1 0.995 0.738
broccoli 128 11 0.824 0.364 0.491 0.356
carrot 128 24 0.685 0.625 0.713 0.487
hot dog 128 2 0.547 1 0.663 0.614
pizza 128 5 1 0.792 0.962 0.77
donut 128 14 0.674 1 0.946 0.799
cake 128 4 0.774 1 0.995 0.846
chair 128 35 0.612 0.632 0.619 0.319
couch 128 6 1 0.646 0.826 0.527
potted plant 128 14 0.903 0.786 0.875 0.508
bed 128 3 1 0 0.863 0.532
dining table 128 13 0.804 0.317 0.621 0.407
toilet 128 2 0.857 1 0.995 0.796
tv 128 2 0.751 1 0.995 0.796
laptop 128 3 1 0 0.913 0.548
mouse 128 2 1 0 0.0907 0.0454
remote 128 8 1 0.617 0.629 0.513
cell phone 128 8 0.722 0.332 0.453 0.262
microwave 128 3 0.848 1 0.995 0.843
oven 128 5 0.667 0.4 0.43 0.298
sink 128 6 0.272 0.167 0.338 0.252
refrigerator 128 5 0.672 0.8 0.81 0.558
book 128 29 0.749 0.206 0.367 0.168
clock 128 9 0.762 0.778 0.879 0.689
vase 128 2 0.439 1 0.995 0.895
scissors 128 1 1 0 0.166 0.0332
teddy bear 128 21 0.85 0.571 0.788 0.506
toothbrush 128 5 0.826 1 0.995 0.618
Results saved to #033[1m/opt/ml/model/exp#033[0m
2023-06-30 08:57:53,434 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.
2023-06-30 08:57:53,434 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.
2023-06-30 08:57:53,435 sagemaker-training-toolkit INFO Reporting training SUCCESS
2023-06-30 08:58:07 Uploading - Uploading generated training model
2023-06-30 08:58:07 Completed - Training job completed
Training seconds: 427
Billable seconds: 427
기계 학습 완료 후 저장된 모델 파일 및 경로를 확인합니다.
기계 학습이 완료되어 저장된 모델 파일은 SageMaker의 기본 S3 버킷 경로에 생성되며 output 디렉토리의 model.tar.gz 이름으로 생성됩니다. 이제 생성된 모델을 통해 AI(인공지능) 분야에서 사용하기길 바랍니다!
SageMaker에서 Studio의 Jupyter Notebook 파일(ipynb)을 통해 간단히 기계 학습을 완료하였습니다.
이제 기계 학습 시 사용할 라이브러리, 모델, 학습 데이터 등을 설정 후 원하는 기계 학습 환경을 구성해보시기 바랍니다.
지금까지 SageMaker에서 PyTorch를 활용한 기계 학습 환경을 구축하는 작업을 알아보는 시간이었습니다....! 끝...!
유익하게 보셨다면 공감을 눌러주고, 댓글로 의견을 공유해 남겨주시면 감사하겠습니다!
[Reference]
- https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/pytorch.html
- https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html
'AWS > SageMaker' 카테고리의 다른 글
[AWS] SageMaker S3 버킷에 저장된 학습 데이터 사용하기 (0) | 2023.07.05 |
---|---|
[AWS] SageMaker Notebook 알아보기 (0) | 2023.05.18 |
[AWS] SageMaker Studio 알아보기 (0) | 2023.05.16 |
[AWS] Amazon SageMaker 알아보기 (0) | 2023.04.27 |