GateLoop: Data-Controlled Linear Recurrence for Sequence Modeling

byefruit · on Dec 7, 2023

Lucidrains has a re-implementation of this: https://github.com/lucidrains/gateloop-transformer and was unable to beat the transformer baseline at equal numbers of parameters.

zxexz · on Dec 7, 2023

I saw that. The WandB report is here [0]. The losses are quite close, but in my mind we'd need to see parameter counts at least 1 and 2 params more to make conclusions (with the dataset scaling proportionally). If the training performance can be investigated, there may be some wins in this area!

[0] https://api.wandb.ai/links/lucidrains/lgz368mf