Slow learning problem of RL in supply chain and Replenishment

I am trying to use reinforcement learning such as DQN to create a system to issue purchase orders for our supply chain and Replenishment system. But it usually takes weeks before we are able to find the actual reward and values for an ordering action since vendors usually have long lead times. How to deal with this slow learning problem?

