Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Drive scans files for copyright infringement (twitter.com/1littlecoder)
125 points by amrrs on July 23, 2024 | hide | past | favorite | 77 comments


This (the copyright scanning policy) isn't new, is it?

What it's scanning for in this case is material it believes to be copyrighted, and restricts features (notably sharing) for content that matches.

Given that copyright law exists, and that Google doesn't like wasting engineering time on legal stuff unless refusing to do it would result in lawsuits, this scanning policy is a fairly low-impact solution that has probably been deemed legally necessary to avoid media company lawsuits. I don't like it, but the alternative is for Google (and Microsoft, and any other cloud storage that allows sharing) to mount an expensive legal effort to try to overturn decades of digital copyright precedent, which is likely to fail.


> What it's scanning for is material it believes to be copyrighted,

All "material" created by any human is copyrighted, and has been for decades. The question is who owns what rights, which Google can't know.

> decades of digital copyright precedent

What precedent?


Precedent: If you're hosting it you're guilty.


Or more tactfully, if you're hosting it, we'll assume you're guilty because we don't want to deal with it, and you agreed to a very restrictive ToS that lets us take it down.


Google cannot know if you have the rights for any possible document. If you have a whole book it could be bought from Amazon or downloaded via Torrent.

There can even be the case where you have private conteacts that give you rights that Google cannot know.

I can inly thing that they will look for the low hanging fruit: a folder full of movies, music, and other content.


Not really, otherwise services that host user-generated content, like youtube, wouldn't exist.

What a hosting service needs to provide is a way for the user to flag content + provide an "acceptable" response time.

Another approach, a more proactive approach, is for the content owners to share the digital "finger print" of the IP (movies, music, etc.) with the hosting service, so that the hosting service can scan uploaded files and compare them.


> been deemed legally necessary

Ah, the old "deeming things" trick [0].

Almost everything is copyrighted. Like most of us here I've given original writing, code or music to people who've shared it on Google drive. That material is copyrighted no more or less than anything by Disney or Sony.

Google doesn't just "scan for copyright violations", it specifically acts out of fear or leverage to be an unpaid policeman for special interests, rich and powerful media companies.

I haven't said anything new here, but I do think it's important that we see arguments built on false standards. Google isn't championing the law or anything noble and we would do well to be very precise about choosing words to describe what is happening.

[0] deem: acting by fiat and art without necessary recourse to logic, law, evidence or consistency


The existence of copyright law is to blame. Not Google, who's just trying to adhere to law.


Why not both?


Good point.


It's only after you share a file that the copyright scan is triggered


Proton Drive has entered the chat.

Encrypt everything and you, as the company offering services, no longer need to worry about whats being shared because you cant see it.


Better yet, just don’t use managed services like GDrive


If you need to, there are alternatives like proton or mega, or a 100 other services Im sure


They have for a long time, but it's only an issue if you share a file publicly.

If you just keep files private on your Drive, or share them with specific other accounts but not to anyone with the link, there's no copyright scanning.

All of this makes perfect sense because Google doesn't want people using their Drive accounts for mass sharing pirated content.

While if you're just backing up your CD's and DVD's for personal private use, everything works fine, zero problems.


time to start changing the hash by adding a few bytes to your movies before uploading to Google Drive

  head -c 20 /dev/urandom >> movie.mp4
won't affect playback, will affect Google finding your pirated films.


I'm pretty sure Google engineers are smart enough to detect quick workarounds that fit in a comment on Hacker News.

They already have to implement such a thing for finding copyrighted material in Youtube videos, so they know how to deal with mixed signals.


Or the "simple" solution is sufficient as the vast majority of users won't even use "quick workarounds that fit in a comment on Hacker News".

And if the simple solution that still works most of the time is sufficient to avoid liability in the courts then it's solved the problem they were aiming at.

If the more complex solution costs more in engineering time than the benefit, they're arguably obliged as a public company to not implement it. Often chasing the long tail just isn't worth it.


Encrypt the file and don't upload the key.


They block it outright, even when the purpose is innocent (sensitive documents). Tried with rar and 7-zip.


This is trivally disprovable. Everyone has countless encrypted files on gdrive. What do you think a keepass db is for example?


That might be an issue with rar and 7-zip specifically. RAR at least used to be very common with warez and not very common out of it—might just be a low-effort way of discouraging it.

Not endorsing this, btw, I much prefer dumb pipes.


anecdotal / N=1 here, but I've uploaded standard 7z encrypted backups to personal Drive without issue


I don't know why it didn't work then. I encrypted the file listing too.


Maybe they only apply it to shared files?


Definitely not true. Source: have .zip and .rar files in Google Drive


Perhaps you’re on a ‘list’ for questionable content.


Wow, really? You can't store an encrypted file on Google Drive? I guess I shouldn't be surprised, but I am, mildly.

... so it has no utility at all, since you definitely shouldn't be storing any unencrypted files on it.


That doesn't seem to be most people's experience, see sibling posts.

As another data point, I have several manually-uploaded zip and 7z files of my own documents and content, both encrypted and not, and never had a problem.

The majority of my Drive usage by bytes is encrypted backup pack files uploaded via API by Arq Backup or rclone. That's all worked fine for many years.


Sorry, I may have conflated Gmail with GDrive, my bad.


IIRC I did get a roadblock attaching a zip file with GMail several years ago.

GMail also complained about direct-attaching a locally-built, unsigned exe file to an address that I'd never corresponded with. To be fair this probably should have caused suspicion :)


Or if they didn't think about this, they know do after reading this thread :)


My guess is that the very minimum they do is cut files in small blobs and analyze each one separately, so that kind of scheme (or adding random junk at the beginning or the middle or zipping it or ...) doesn't work.


Seems like this would corrupt the file. There are plenty of metadata fields you could just put some crap in (or just transpose letters in an existing string so you don't need to change any length markers.


That depends on the container format, and with some container formats on the parser. Any container format designed to be streamable would by definition survive corruption at the end. Provided the player doesn't get too upset if any metadata at the end is corrupted, but e.g. VLC handles such things quite well


As a fun example of "depends on the container format", one trick people used to use for sharing files on image boards (4chan and the like) was to concatenate a rar file and jpg file. I don't remember which order it was, but one of the two used a header (read start to end), and the other used a "footer" (read end to start), so you could use it either way depending on what you opened it as.

I still have a handful of files which are books in PDF format in a RAR file, and simultaneously the book's cover as a jpg.


I remember this too but with jpgs and zips, if I remember, jpgs go first as zips have the metadata at the end of the file (to the best of my knowledge, corrections welcome).


Most media files are likely to tolerate random garbage tacked on to the end of the file. ID3v1 tags are essential proof of that; 128 bytes of garbage at the end that didn't cause any trouble with playback.


I would really hope they would ignore the metadata, when computing the hash, for this very reason. Properly tagging films you download isn't exactly rare.


i've never had any problems with playback using the major vid players on Linux with files i may or may not have used this trick on.


Are you using this technique successfully? I'd assumed they were using phashes or similar.


yeah I'm pretty certain they are re-using what was already in use by YouTube which is based on perceptual hashes


They may be doing locality sensitive hashing in which case this wouldn't matter.


This makes sense, though. I know I'm not the only one who looked up "Shrek.mp4" on Google and got a literal sea of pirated movies hosted on Google Drive.


it make sense on public content not on private stuff you're not opening to the public. I think most people have a sense of I can put what I want on my "Cloud" storage especially if it's something you're paying for.


Google is presumably unable to determine that you paid for the right to copy this music file. Hence its "may".


I'm just imagining Google deleting Steven Spielberg's entire private collection that he stores in Google Cloud because of the copyright claims.


Yup. Instead of Digital Rights Management we have Digital Restrictions Management masquerading as it.


On the other hand if that file is shared publicly, Google might be liable under some jurisdictions IIUC.

I d assume that they check some hashes of the file against a database to check for copyright infringement. If only specific actions are not permitted on the file e.g. sharing it widely, this could seem reasonable?

Curious to learn more, what could be other actions the service provider could take to avoid getting a lawsuit?


What about legal media files? In addition to be a software engineer I'm a DJ... I have near 1 TB of LEGAL music (mostly FLAC or high bitrate music files) on my OneDrive (easier because I use Virtual DJ). Yes I know it's not Google Drive but how they can make a difference between legal and illegal files?

Note: I'm in a process to completely "de-googled, de-microsoft, ..." all my stuff (big self-hosted TrueNAS server with BlackBlaze backup).


At least it used to be the case they only scan shared files.



lol @ that guy LARPing as a Google employee. Reminds me of the Microsoft Answers forum.


It’s annoying. They run interference for actual employees and allow them to not properly staff customer care forums.


What … properly … staffed … Google … support … forum?


Wow, imagine doing that for free.


For this reason and others (I was a Google One family subscriber), I completely de-Googled myself.


Put movies in a password protected zip.


At that point, just use Backblaze. It's reasonable in cost for "unlimited" and creates an encrypted prior to transit.


For convenience: the linked object is a text comment, plus a screenshot of text,

- "so, google has scanned my recently filed scanned files and said it's a copyright infringement"

- "Bro, tell me your Gemini datasplit?"

> "Google"

> "Your file may violate Google Drive's Terms of Service"

> ""05 - You are always choosing.mp3" contains content that may violate Google Drive's Copyright Infringement policy. Some features related to this file may have been restricted. "

> "Restricted file 05 - You are always choosing.mp3" - ""


AI is going to start deleting everything and locking us out of Google drive. It's coming.


The year of the local cloud is just around the corner. My concern are books I buy from places like pragprog in pdf format. I feel like Google simply doesn't care and would ban on first offense.


They calculate hashes for files and probably compare them to already reported ones


What is going on here? Is the person still able to download the file at least?


Does onedrive do something similar?

Glad I don't use google drive.


Use a foreign service like Mega.


so I bought a PDF book, will Google block that too as well?


Google once again proving why even people who have nothing to hide have reasons to use End to End encryption.

People laugh when I suggest iCloud but Apple isn’t pulling this shit and has mostly the same functionality a non-business user needs.


This is some Orwelian nonsense.


Honestly. Does this mean that if someone takes a picture of their willy with Google Photos enabled, it'll censor myself from myself? Where does it end?


It might get your account canceled


What a massive irony and a cruel joke. The headline should really be "Google Drive scans your files for copyright infringement", because this is copyright for me not for thee, at its absolute finest.

I'm not even talking about Gemini's training data.

Recall that it was Google that utterly nonchalantly scanned & stored some 40 million books, without seeking and obtaining permission from a single one of their authors. [ https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....]

Is the internet as we knew it drawing to a close?


> Is the internet as we knew it drawing to a close?

Were the interactions on the internet as you knew it on completely centralized commercial platforms?


The opposite.


People not stupid enough to use Google's cloud services not affected.


This is unrelated to GCP. If you're going to be an abrasive tool then you should at the very least have some shred of an idea of what you're talking about, laddie


Google Drive is a cloud service. Words have meanings outside of Google's brand names, Laddie.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: