This is the biggest thing holding gpt back. Everyone with meaningful data has their hands tied behind their back. So many ideas and the answer is “we can’t put that data in gpt” very frustrating.
I'm afraid that even the most obedient human can't readily dump the contents of their connectome in a readable format. Same likely applies to LLMs: they study human-generated texts, not their own source code, let alone their tensors' weights.
Well, what they study is decided by the relevant hoominz. There's nothing actually stopping LLMs from trying to understand their own innards, is there ? Except for the actual access.
Hospitals are not storing the data on a harddrive in their basement so clearly this is a solvable problem. Here's a list of AWS services which can be used to store HIPAA data:
The biglaw firms I’m familiar with still store matter data exclusively on-prem. There’s a significant chunk of floor space in my office tower dedicated to running a law firm server farm for a satellite office.
Or legal order. If you're on-site or on-cloud and in the US then it might not matter since they can get your data anyway, but if you're in another country uploading data across borders can be a problem.