Post-quantum cryptography had attracted a lot of attentions in recent years, due to the potential threat emerged from quantum computer against traditional public key cryptography. Among all post-quantum candidates, lattice-based cryptography is considered the most promising and well studied one. The most time consuming operation in lattice-based cryptography schemes is polynomial multiplication. Through careful selection of the lattice parameters, the polynomial multiplication can be accelerated by Number Theoretic Transform (NTT) and massively parallel architecture like Graphics Processing Units (GPU). However, existing NTT implementation in GPU only focuses on parallelizing one of the three for loop, which eventually causes slow performance and warp divergence. In this paper, we proposed a strategy to mitigate this problem and avoid the warp divergence. To verify the effectiveness of the proposed strategy, the NTT was implemented following the lattice parameters in qTESLA, which is one of the round 2 candidates in NIST Post-Quantum Standardization competition. To the best of our knowledge, this is the first implementation of NTT in GPU with parameters from qTESLA. The proposed implementation can be used to accelerate qTESLA signature generation and verification in batch, which is very useful under server environment. On top of that, the proposed GPU implementation can also be generalized to other lattice-based schemes.