I used the Dolly-15k dataset, Llama-2-7B-chat model and an A800 device to make some experiments between your lookahead and lookahead-decoding, but I'm unable to replicate the results on the table2 of the paper. Even your lookahead was not as good as the lookahead-decoding.