Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GateLoop: Data-Controlled Linear Recurrence for Sequence Modeling (arxiv.org)
26 points by zxexz on Dec 7, 2023 | hide | past | favorite | 2 comments


Lucidrains has a re-implementation of this: https://github.com/lucidrains/gateloop-transformer and was unable to beat the transformer baseline at equal numbers of parameters.


I saw that. The WandB report is here [0]. The losses are quite close, but in my mind we'd need to see parameter counts at least 1 and 2 params more to make conclusions (with the dataset scaling proportionally). If the training performance can be investigated, there may be some wins in this area!

[0] https://api.wandb.ai/links/lucidrains/lgz368mf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: