Thanks for testing again. So, it has nothing to do with the Nexys Board. This is a bit weird, because in the program loop there is nothing that is executed in the first run only. All initialization is done before.
So, one explanation could be, that the CPU does some caching in the first run - but the delay is far too long for 40Mhz for that few lines of code, so to be honest, i do not assume that this is really the case.
I guess, the easiest solution is to simulate a keypress at startup as you did it, so there is no delay for the user.