[RL] Actor-critic deep reinforcement learning for solving job shop scheduling problems(2020)

Scheduling/Paper 2021. 4. 19. 09:30

논문/사례 제목	출간년도	저자
Actor-critic deep reinforcement learning for solving job shop scheduling problems	2020	Chien-liang Liu, Chuan-chin Change, Chun-jan Tseng

요약

Job shop scheduling problem(JSSP)에 deep reinforcement learning(DRL) 방법론을 적용하여 dispatching rule보다 성능적 우위, optimal model보다 시간적 우위를 도출함.

사용 방법론

-Actor-Critic algorithm
Critic network : 현재의 policy를 평가하는 역할
-> action을 해보고 그 action의 value function이 높았으면 그 action을 할 확률을 높이도록 policy의 parameter를 업데이트
-> CNN architecture를 이용해 Q-value estimation
Actor network : critic network 결과를 기반으로 appropriate action 선택
-> ε-greedy policy보다 높은 exploration 효율성

-DRL on JSSP
state matrix of JSSP : process time, assignment, completed

시작
for training iteration > N, do
      Critic, actor network 초기화
      Machine(resource)에 agent 할당(다수의 machine이 존재하므로 JSSP는 multiple agent environment)
      for 모든 jobs 완료, do
         1. agent가 action하도록 actor network에 state 전달(Action은 dispatching rule such as SPT, LPT에 따름)
         2. job 할당이 끝나면 state 업데이트
         3. process time, remaining time에 따라 environment는 critic network에 reward 전달
         4. critic을 기반으로 action network에서 dispatching policy 수정
      end for
      Environment gives reward depending on maximum makespan
end for
종료

-Asynchronous DDPG
JSSP에서 한 machine에 job이 할당되는 것은 다른 machine의 job 할당에 영향을 줌
-> 그러나 JSSP를 multiple agent environment로 모델링 했기 때문에 각 agent가 각자의 환경에 존재함
-> Global network를 구성하여 각 agent들이 parameter를 copy해오고 각 agent들의 state가 변할때 마다 비동기 방식으로 global network update

실험결과

실험환경 : 6×6, 10×10, 20×5 문제를 proposed model을 포함한 여러 방법을 이용하여 scheduling을 진행

1.Makespan
Scheduling score(%)=(makespan of method -optimal makespan)/(optimal makespan)

Dispatching rule*(70~80%) < Proposed model(90%초반) < Traditional RL**(96%) < Heuristic***(99%)
* FIFO, LPT, SPT
** Generic multi-agent Q-learning
***Colony optimization algorithm with taboo search algorithm

2.Computational time
processing time의 변화, machine breakdown, permutation of machine ordering등이 발생하여 기존 schedule을 이행할 수 없는 경우 rescheduling을 진행하고 computational time을 측정

'Scheduling > Paper' 카테고리의 다른 글

[RL] Toward optimal assembly line order sequencing with reinforcement learning a case study(2020) (0)	2021.04.19
[RL] Value-based algorithm VS Policy-based algorithm (0)	2021.04.19
[RL] Simulation study on reward function of reinforcement learning in gantry work cell scheduling(2018) (0)	2021.04.19
[RL] Gantry Work Cell Scheduling through Reinforcement Learning with knowledge-guided Reward Setting(2018) (0)	2021.04.19
[GNN] A graph neural network assisted monte carlo tree search approach to traveling salesman problem(2020) (0)	2021.04.19

ABOUT ME

우우 우우

'Scheduling > Paper' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'Scheduling > Paper' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바