> unless you already have some details such as the lat/Lon of dropoff and pickup and time of the stops.
These are literally columns in the data set. To quote the original post:
Each file has about 14 million rows, and each row contains medallion, hack license, vendor id, rate code, store and forward flag, pickup date/time dropoff date/time, passenger count, trip time in seconds, trip distance, and latitude/longitude coordinates for the pickup and dropoff locations. [1]
You're right that a lot of cab pickups/dropoffs happen a few doors down from the actual location, and that there aren't a lot of single family homes in Manhattan. But that doesn't negate what I'm saying. Even if only 20% of rides involve the actual location, that's still an awful lot of potential privacy violations. And even if there are zero single family homes involved, that was only the first scenario of numerous ones I mentioned.
What I meant was if you already know the time and location, say you know what time the barista left work and the lat/Lon of the coffee shop. It would just be a matching it up with the data in the table to find the drop off.
What I don't think is practical is identifying everyone who may or may not have left a particular bar.
> What I don't think is practical is identifying everyone who may or may not have left a particular bar.
This is weaker than your original claim which was that deducing passenger identities is "practically impossible". You've now conceded that the barista scenario is plausible and left open several others I mentioned.
But let's examine this one, just bars. The basis of your criticism is that some bars are located in residential buildings. First off this still leaves quite a number of bars that aren't. But even for those that are, the time of day and direction of travel is a pretty fair indicator of people who are bar patrons vs. residents. I.e. trips departing the building after 1am and arriving at a residential location are probably a lot more likely to be bar patrons than residents.
And don't forget that this public data set is also potentially privacy-violating when combined with other data about the destination, such as information that other residents of that location may know. So even if the general public couldn't determine much from a trip from a gay bar to a home residence one night, a live-in parent could.
These are literally columns in the data set. To quote the original post:
Each file has about 14 million rows, and each row contains medallion, hack license, vendor id, rate code, store and forward flag, pickup date/time dropoff date/time, passenger count, trip time in seconds, trip distance, and latitude/longitude coordinates for the pickup and dropoff locations. [1]
You're right that a lot of cab pickups/dropoffs happen a few doors down from the actual location, and that there aren't a lot of single family homes in Manhattan. But that doesn't negate what I'm saying. Even if only 20% of rides involve the actual location, that's still an awful lot of potential privacy violations. And even if there are zero single family homes involved, that was only the first scenario of numerous ones I mentioned.
[1] http://chriswhong.com/open-data/foil_nyc_taxi/