This paper introduces the porting of an industrial neural network simulator onto GPUs used in a tool-chain to sort massive amounts of E-mails and other textual data. Compared to other previous work, all steps are being executed on the GPU, achieving overall up to 33× speedup without using any cuBLAS functionality. All the time-consuming routines have been ported onto the GPU, i.e. the training-, the simulation- and the verification-phases, the training being the most time-consuming. It is planned to include these GPU-kernels into the product for special costumer's demands.