> A user’s decision to move data to another service should not result in any loss of transparency or control over that data.
> It is worth noting that the Data Transfer Project doesn’t include any automated deletion architecture. Once a user has verified that the desired data is migrated, they would have to delete their data from their original service using that service’s deletion tool if they wanted the data deleted.
This project has copy, not move semantics. Therefore, in contrast to the stated purpose of allowing users to control their data, it actually has the opposite consequence of making it simpler to spread users' data around. Without a delete capability, the bias is towards multiple copies of user data.
This project normalizes web scraping to export data from non-participating APIs that project partners benefit from asymmetrically by establishing this as an open-source effort. In other words, API providers that do not provide export tools will nonetheless be subject to DTP adapters that exfiltrate data and send it to the (no doubt excellent) DTP importers maintained by DTP partners. This has the effect of creating a partial vacuum, sucking data from non-participants into participants' systems.
The economics of maintaining a high-volume bidirectional synchronization pipeline between DTP partners guarantees that these toy DTP adapters will not be the technology used to copy data between DTP partners, but rather, a dedicated pathway will be established instead. In other words, the public open-source DTP effort could be understood as a facade designed to create a plausible reason for why DTP partners have cross-connected their systems.
TLDR:
- Copy semantics are counterproductive to the goal of providing user control of their data.
- The approach of using existing APIs to scrape data from non-participating vendors is a priori hostile.
- Economics dictate that the lowest cost option for providing bidirectional synchronization between vendors involve dedicated links and specialized transport schemes that DTP project itself does not provide equally.
There is some merit to providing abstract representations of common data formats -- look at EDI, for instance. I'd welcome someone from the project stopping by to explain away my concerns.
I wanted to provide my thinking on some of these very valid wories,
Re: Copy vs. Move: This was a conscious choice that I think has a solid backing in two things:
1) In our user research for Takeout, the majority of users who user Takeout don't do it to leave Google. We suspect that the same will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement.
2) Users should absolutely be able to delete their data once they copy it. However we think that separating the two is better for the user. For instance you want to make sure the user has a chance to verify the fidelity of the data at the destination. It would be terrible if a user ported their photos to a new provider and the new provider down-sampled them and originals were automatically deleted.
Re: Scraping
Its true that DTP can use API of companies that are 'participating' in DTP. But we don't do it by scraping their UIs. We do it like any other app developer, asking for an API key, which that service is free to decline to give. One of the foundational principals we cover in the white paper is that the source service maintain control over who, how, and and when to give the data out via their API. So if they aren't interesteed in their data being used via DTP, that is absolutely their choice.
Re: Economics
As with all future looking statements we'll have to wait and see how it works out. But I'll give one antidote on why I don't think this will happen. Google Takeout (which I also work on) allows users to export their data to OneDrive, DropBox, and Box (as well as Google Drive). One of the reasons we wanted to make DTP is we were tired of dealing with other peoples APIs, as it doesn't scale well. Google should build adapters for Google, and Microsoft should build adapters for Microsoft. So with Takeout we tried the specialized transport method, but it was a lot of work, so we went with the DTP approach specifically to try to avoid having specialized transports.
DTP is still in the early phases, and I would encourage you, and everyone else, to get involved in the project (https://github.com/google/data-transfer-project) and help shape the direction of the project.
Hey! Thanks for the response. If you don't mind, I have some questions and comments after reading through your feedback.
> We suspect that [the majority of users who use Takeout don't do it to leave Google] will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement.
Interesting, thanks. I think this sort of worldview makes sense from a certain perspective.
> 2) Users should absolutely be able to delete their data once they copy it.
This is an aspirational statement and not a requirement of DTP, so it's problematic from a public perception standpoint to make the claim that DTP provides the user with more control of their data when the control very much remains at the mercy of the data controller. Indeed, this project directly facilitates the opportunity for more data controllers to obtain copies of the subject's data.
> If they aren't interested in their data being used via DTP, that is absolutely their choice.
Can you clarify whether you are saying that the DTP Project will honor takedown requests from parties targeted by DTP tooling?
> Google should build adapters for Google, and Microsoft should build adapters for Microsoft.
Can you explain the business drivers that incentivize these companies to provide parity between their import and export capabilities? Does the DTP Project require parity between these capabilities?
>This is an aspirational statement and not a requirement of DTP, so it's problematic from a public perception standpoint to make the claim that DTP provides the user with more control of their data when the control very much remains at the mercy of the data controller. Indeed, this project directly facilitates the opportunity for more data controllers to obtain copies of the subject's data.
I don't really disagree with what you, but I interpret things differently:
Without DTP, if you ask a data controller to delete your data you have to trust that they do. There is very little way to verify that the deletion actually happened, you more or less need to rely on the reputation of the company. Nowadays they all should have published retention statements which state their deletion practices in more details, so that helps some, and allow for some recourse if in fact they aren't following it. But in general for the average user, it comes down mostly to trust.
With DTP, nothing is worse. But users now can get their data into a new service easier.
If DTP had move semantics you still have the same problem as above, it mostly comes down to trust.
It is true that after a copy there are now two copies of the data, which isn't ideal in terms of data minimization. But because of the reasons I outline previously, I think it is important to keep deletion as a separate action from copy. I do think that after a copy the option to delete the data should be presented to the user prominently to make that as easy as possible if that is what they want to do.
So DTP isn't trying to solve every problem, but my take is that it makes some things better without making anything else significantly worse, so it's a net win.
> Can you clarify whether you are saying that the DTP Project will honor takedown requests from parties targeted by DTP tooling?
DTP doesn't really store data, so I don't think it is scope for a traditional takedown request. But I think more to the spirit of the question, yes if a service doesn't want to grant a DTP host a API key, or revoked an API, we wouldn't condone trying to work around that.
(One super detailed note, DTP is just an open source project, and doesn't operate any production code. A Hosting Entity can download/run the code. A Hosting Entity could be a company letting users transfer data in or out, or a user running DTP locally. Each Hosting Entity is responsible for acquiring API keys for all the services they want to interact with; including agreeing to and complying with any restrictions that that service might impose for access to their API.)
> Can you explain the business drivers that incentivize these companies to provide parity between their import and export capabilities? Does the DTP Project require parity between these capabilities?
This is a little bit of a bet on our part. I think Google has demonstrated, through its almost decade long investment in Takeout, that giving users more control over their data leads to greater user trust and that is good for business.
As for requiring parity, we cover this a bit in the white paper, but as you say, we recognize the reciprocity is key, and we need to incentive services to invest equally in import and export otherwise the whole thing falls apart.
Right now the stance we are taking is the reciprocity is strongly encouraged and we will be collection stats/metrics to try to measure it so we can name and shame folks that aren't following that. We hope that by providing transparency around different service's practices in this area will allow users to make informed decisions about where to store their data.
An interesting thought experiment in this area is that if a user wants to transfer data from service A to service B, but service B doesn't allow export back out, what should service A do? Ideally you force service B to support export, but on the other hand the user should be in control, and who is service A to say no. Its almost putting the good of an individual user against the good of the ecosystem.
We are hoping that as the project, and the large portability ecosystem, evolves there emerges some kind of neutral governance model that can help mediate some of these issues. It is problematic for service A to decide that question, but a neutral group representing the interests of users will have more legitimacy in making these tough questions.
Thanks for taking the time to provide these detailed follow ups. I'm still pretty wary of this project, but you've demonstrated that at least one person on the team is thinking through some of this stuff.
> An interesting thought experiment in this area is that if a user wants to transfer data from service A to service B, but service B doesn't allow export back out, what should service A do? Ideally you force service B to support export, but on the other hand the user should be in control, and who is service A to say no. Its almost putting the good of an individual user against the good of the ecosystem.
I'll offer that the European Union's answer to this -- the GDPR -- is to put the data subject first. It would be nice to see the DTP Project align with that position.
In this context, "delete" should probably be understood to mean "removed from production systems, and retained only to the extent required to meet legal obligations".
Some might argue that web scraping to export data from non-participating services has been normalized for a significant amount of time. This has been a common practice for decades!
Further, it's perhaps possible that copy semantics comprise a very useful operational primitive when coupled with the existing delete primitives. With copy and delete actions available to them, users can choose to share, move, and delete data as they see fit. With only move actions available, users do not get to make their own choices and are limited to the choices prescribed for them.
There is *substantial( merit in calling for more directly useful primitives, but this could perhaps be done in a context that is informed by knowledge of extant primitives.
> It is worth noting that the Data Transfer Project doesn’t include any automated deletion architecture. Once a user has verified that the desired data is migrated, they would have to delete their data from their original service using that service’s deletion tool if they wanted the data deleted.
This project has copy, not move semantics. Therefore, in contrast to the stated purpose of allowing users to control their data, it actually has the opposite consequence of making it simpler to spread users' data around. Without a delete capability, the bias is towards multiple copies of user data.
This project normalizes web scraping to export data from non-participating APIs that project partners benefit from asymmetrically by establishing this as an open-source effort. In other words, API providers that do not provide export tools will nonetheless be subject to DTP adapters that exfiltrate data and send it to the (no doubt excellent) DTP importers maintained by DTP partners. This has the effect of creating a partial vacuum, sucking data from non-participants into participants' systems.
The economics of maintaining a high-volume bidirectional synchronization pipeline between DTP partners guarantees that these toy DTP adapters will not be the technology used to copy data between DTP partners, but rather, a dedicated pathway will be established instead. In other words, the public open-source DTP effort could be understood as a facade designed to create a plausible reason for why DTP partners have cross-connected their systems.
TLDR:
- Copy semantics are counterproductive to the goal of providing user control of their data.
- The approach of using existing APIs to scrape data from non-participating vendors is a priori hostile.
- Economics dictate that the lowest cost option for providing bidirectional synchronization between vendors involve dedicated links and specialized transport schemes that DTP project itself does not provide equally.
There is some merit to providing abstract representations of common data formats -- look at EDI, for instance. I'd welcome someone from the project stopping by to explain away my concerns.