Google Drive scans files for copyright infringement

harshreality · on July 23, 2024

This (the copyright scanning policy) isn't new, is it?

What it's scanning for in this case is material it believes to be copyrighted, and restricts features (notably sharing) for content that matches.

Given that copyright law exists, and that Google doesn't like wasting engineering time on legal stuff unless refusing to do it would result in lawsuits, this scanning policy is a fairly low-impact solution that has probably been deemed legally necessary to avoid media company lawsuits. I don't like it, but the alternative is for Google (and Microsoft, and any other cloud storage that allows sharing) to mount an expensive legal effort to try to overturn decades of digital copyright precedent, which is likely to fail.

Hizonner · on July 23, 2024

> What it's scanning for is material it believes to be copyrighted,

All "material" created by any human is copyrighted, and has been for decades. The question is who owns what rights, which Google can't know.

> decades of digital copyright precedent

What precedent?

djbusby · on July 23, 2024

Precedent: If you're hosting it you're guilty.

edude03 · on July 23, 2024

Or more tactfully, if you're hosting it, we'll assume you're guilty because we don't want to deal with it, and you agreed to a very restrictive ToS that lets us take it down.

wslh · on July 23, 2024

Google cannot know if you have the rights for any possible document. If you have a whole book it could be bought from Amazon or downloaded via Torrent.

There can even be the case where you have private conteacts that give you rights that Google cannot know.

I can inly thing that they will look for the low hanging fruit: a folder full of movies, music, and other content.

zinglersen · on July 23, 2024

Not really, otherwise services that host user-generated content, like youtube, wouldn't exist.

What a hosting service needs to provide is a way for the user to flag content + provide an "acceptable" response time.

Another approach, a more proactive approach, is for the content owners to share the digital "finger print" of the IP (movies, music, etc.) with the hosting service, so that the hosting service can scan uploaded files and compare them.

nonrandomstring · on July 23, 2024

> been deemed legally necessary

Ah, the old "deeming things" trick [0].

Almost everything is copyrighted. Like most of us here I've given original writing, code or music to people who've shared it on Google drive. That material is copyrighted no more or less than anything by Disney or Sony.

Google doesn't just "scan for copyright violations", it specifically acts out of fear or leverage to be an unpaid policeman for special interests, rich and powerful media companies.

I haven't said anything new here, but I do think it's important that we see arguments built on false standards. Google isn't championing the law or anything noble and we would do well to be very precise about choosing words to describe what is happening.

[0] deem: acting by fiat and art without necessary recourse to logic, law, evidence or consistency

em3rgent0rdr · on July 23, 2024

The existence of copyright law is to blame. Not Google, who's just trying to adhere to law.

rnd0 · on July 23, 2024

Why not both?

em3rgent0rdr · on July 24, 2024

Good point.

geor9e · on July 23, 2024

It's only after you share a file that the copyright scan is triggered

username135 · on July 23, 2024

Proton Drive has entered the chat.

Encrypt everything and you, as the company offering services, no longer need to worry about whats being shared because you cant see it.

xyst · on July 23, 2024

Better yet, just don’t use managed services like GDrive

username135 · on July 23, 2024

If you need to, there are alternatives like proton or mega, or a 100 other services Im sure

crazygringo · on July 23, 2024

They have for a long time, but it's only an issue if you share a file publicly.

If you just keep files private on your Drive, or share them with specific other accounts but not to anyone with the link, there's no copyright scanning.

All of this makes perfect sense because Google doesn't want people using their Drive accounts for mass sharing pirated content.

While if you're just backing up your CD's and DVD's for personal private use, everything works fine, zero problems.

ricktdotorg · on July 23, 2024

time to start changing the hash by adding a few bytes to your movies before uploading to Google Drive

  head -c 20 /dev/urandom >> movie.mp4

won't affect playback, will affect Google finding your pirated films.

rakoo · on July 23, 2024

I'm pretty sure Google engineers are smart enough to detect quick workarounds that fit in a comment on Hacker News.

They already have to implement such a thing for finding copyrighted material in Youtube videos, so they know how to deal with mixed signals.

kimixa · on July 23, 2024

Or the "simple" solution is sufficient as the vast majority of users won't even use "quick workarounds that fit in a comment on Hacker News".

And if the simple solution that still works most of the time is sufficient to avoid liability in the courts then it's solved the problem they were aiming at.

If the more complex solution costs more in engineering time than the benefit, they're arguably obliged as a public company to not implement it. Often chasing the long tail just isn't worth it.

Hizonner · on July 23, 2024

Encrypt the file and don't upload the key.

glitchc · on July 23, 2024

They block it outright, even when the purpose is innocent (sensitive documents). Tried with rar and 7-zip.

Brian_K_White · on July 23, 2024

This is trivally disprovable. Everyone has countless encrypted files on gdrive. What do you think a keepass db is for example?

darby_nine · on July 25, 2024

That might be an issue with rar and 7-zip specifically. RAR at least used to be very common with warez and not very common out of it—might just be a low-effort way of discouraging it.

Not endorsing this, btw, I much prefer dumb pipes.

Willish42 · on July 23, 2024

anecdotal / N=1 here, but I've uploaded standard 7z encrypted backups to personal Drive without issue

glitchc · on July 23, 2024

I don't know why it didn't work then. I encrypted the file listing too.

Hizonner · on July 23, 2024

Maybe they only apply it to shared files?

sushid · on July 23, 2024

Definitely not true. Source: have .zip and .rar files in Google Drive

pixxel · on July 24, 2024

Perhaps you’re on a ‘list’ for questionable content.

Hizonner · on July 23, 2024

Wow, really? You can't store an encrypted file on Google Drive? I guess I shouldn't be surprised, but I am, mildly.

... so it has no utility at all, since you definitely shouldn't be storing any unencrypted files on it.

BXlnt2EachOther · on July 24, 2024

That doesn't seem to be most people's experience, see sibling posts.

As another data point, I have several manually-uploaded zip and 7z files of my own documents and content, both encrypted and not, and never had a problem.

The majority of my Drive usage by bytes is encrypted backup pack files uploaded via API by Arq Backup or rclone. That's all worked fine for many years.

glitchc · on July 24, 2024

Sorry, I may have conflated Gmail with GDrive, my bad.

BXlnt2EachOther · on July 24, 2024

IIRC I did get a roadblock attaching a zip file with GMail several years ago.

GMail also complained about direct-attaching a locally-built, unsigned exe file to an address that I'd never corresponded with. To be fair this probably should have caused suspicion :)

HenryBemis · on July 23, 2024

Or if they didn't think about this, they know do after reading this thread :)

rakoo · on July 23, 2024

My guess is that the very minimum they do is cut files in small blobs and analyze each one separately, so that kind of scheme (or adding random junk at the beginning or the middle or zipping it or ...) doesn't work.

nick238 · on July 23, 2024

Seems like this would corrupt the file. There are plenty of metadata fields you could just put some crap in (or just transpose letters in an existing string so you don't need to change any length markers.

wongarsu · on July 23, 2024

That depends on the container format, and with some container formats on the parser. Any container format designed to be streamable would by definition survive corruption at the end. Provided the player doesn't get too upset if any metadata at the end is corrupted, but e.g. VLC handles such things quite well

delecti · on July 23, 2024

As a fun example of "depends on the container format", one trick people used to use for sharing files on image boards (4chan and the like) was to concatenate a rar file and jpg file. I don't remember which order it was, but one of the two used a header (read start to end), and the other used a "footer" (read end to start), so you could use it either way depending on what you opened it as.

I still have a handful of files which are books in PDF format in a RAR file, and simultaneously the book's cover as a jpg.

cowboylowrez · on July 27, 2024

I remember this too but with jpgs and zips, if I remember, jpgs go first as zips have the metadata at the end of the file (to the best of my knowledge, corrections welcome).

toast0 · on July 23, 2024

Most media files are likely to tolerate random garbage tacked on to the end of the file. ID3v1 tags are essential proof of that; 128 bytes of garbage at the end that didn't cause any trouble with playback.

nomel · on July 23, 2024

I would really hope they would ignore the metadata, when computing the hash, for this very reason. Properly tagging films you download isn't exactly rare.

ricktdotorg · on July 23, 2024

i've never had any problems with playback using the major vid players on Linux with files i may or may not have used this trick on.

gazby · on July 24, 2024

Are you using this technique successfully? I'd assumed they were using phashes or similar.

xk3 · on July 25, 2024

yeah I'm pretty certain they are re-using what was already in use by YouTube which is based on perceptual hashes

alfalfasprout · on July 23, 2024

They may be doing locality sensitive hashing in which case this wouldn't matter.

talldayo · on July 23, 2024

This makes sense, though. I know I'm not the only one who looked up "Shrek.mp4" on Google and got a literal sea of pirated movies hosted on Google Drive.

gryn · on July 23, 2024

it make sense on public content not on private stuff you're not opening to the public. I think most people have a sense of I can put what I want on my "Cloud" storage especially if it's something you're paying for.

chrisjj · on July 23, 2024

Google is presumably unable to determine that you paid for the right to copy this music file. Hence its "may".

aeonik · on July 23, 2024

I'm just imagining Google deleting Steven Spielberg's entire private collection that he stores in Google Cloud because of the copyright claims.

chrisjj · on July 23, 2024

Yup. Instead of Digital Rights Management we have Digital Restrictions Management masquerading as it.

vander_elst · on July 23, 2024

On the other hand if that file is shared publicly, Google might be liable under some jurisdictions IIUC.

I d assume that they check some hashes of the file against a database to check for copyright infringement. If only specific actions are not permitted on the file e.g. sharing it widely, this could seem reasonable?

Curious to learn more, what could be other actions the service provider could take to avoid getting a lawsuit?

grumpy-cowboy · on July 29, 2024

What about legal media files? In addition to be a software engineer I'm a DJ... I have near 1 TB of LEGAL music (mostly FLAC or high bitrate music files) on my OneDrive (easier because I use Virtual DJ). Yes I know it's not Google Drive but how they can make a difference between legal and illegal files?

Note: I'm in a process to completely "de-googled, de-microsoft, ..." all my stuff (big self-hosted TrueNAS server with BlackBlaze backup).

math0ne · on July 23, 2024

At least it used to be the case they only scan shared files.

darby_nine · on July 23, 2024

That's not all they scan for: https://support.google.com/docs/thread/200185949/google-is-n...

sunaookami · on July 23, 2024

lol @ that guy LARPing as a Google employee. Reminds me of the Microsoft Answers forum.

ec109685 · on July 23, 2024

It’s annoying. They run interference for actual employees and allow them to not properly staff customer care forums.

egberts1 · on July 23, 2024

What … properly … staffed … Google … support … forum?

RockRobotRock · on July 23, 2024

Wow, imagine doing that for free.

delduca · on July 23, 2024

For this reason and others (I was a Google One family subscriber), I completely de-Googled myself.

user3939382 · on July 23, 2024

Put movies in a password protected zip.

godzillabrennus · on July 23, 2024

At that point, just use Backblaze. It's reasonable in cost for "unlimited" and creates an encrypted prior to transit.

perihelions · on July 23, 2024

For convenience: the linked object is a text comment, plus a screenshot of text,

- "so, google has scanned my recently filed scanned files and said it's a copyright infringement"

- "Bro, tell me your Gemini datasplit?"

> "Google"

> "Your file may violate Google Drive's Terms of Service"

> ""05 - You are always choosing.mp3" contains content that may violate Google Drive's Copyright Infringement policy. Some features related to this file may have been restricted. "

> "Restricted file 05 - You are always choosing.mp3" - ""

josefritzishere · on July 23, 2024

AI is going to start deleting everything and locking us out of Google drive. It's coming.

smrtinsert · on July 23, 2024

The year of the local cloud is just around the corner. My concern are books I buy from places like pragprog in pdf format. I feel like Google simply doesn't care and would ban on first offense.

akaike · on July 23, 2024

They calculate hashes for files and probably compare them to already reported ones

layman51 · on July 23, 2024

What is going on here? Is the person still able to download the file at least?

booleandilemma · on July 23, 2024

Does onedrive do something similar?

Glad I don't use google drive.

OutOfHere · on July 23, 2024

Use a foreign service like Mega.

egberts1 · on July 23, 2024

so I bought a PDF book, will Google block that too as well?

overstay8930 · on July 23, 2024

Google once again proving why even people who have nothing to hide have reasons to use End to End encryption.

People laugh when I suggest iCloud but Apple isn’t pulling this shit and has mostly the same functionality a non-business user needs.

cynicalsecurity · on July 23, 2024

This is some Orwelian nonsense.

mass_and_energy · on July 23, 2024

Honestly. Does this mean that if someone takes a picture of their willy with Google Photos enabled, it'll censor myself from myself? Where does it end?

sitkack · on July 23, 2024

It might get your account canceled

achrono · on July 23, 2024

What a massive irony and a cruel joke. The headline should really be "Google Drive scans your files for copyright infringement", because this is copyright for me not for thee, at its absolute finest.

I'm not even talking about Gemini's training data.

Recall that it was Google that utterly nonchalantly scanned & stored some 40 million books, without seeking and obtaining permission from a single one of their authors. [ https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....]

Is the internet as we knew it drawing to a close?

hiatus · on July 24, 2024

> Is the internet as we knew it drawing to a close?

Were the interactions on the internet as you knew it on completely centralized commercial platforms?

achrono · on July 24, 2024

The opposite.

Hizonner · on July 23, 2024

People not stupid enough to use Google's cloud services not affected.

mass_and_energy · on July 23, 2024

This is unrelated to GCP. If you're going to be an abrasive tool then you should at the very least have some shred of an idea of what you're talking about, laddie

Hizonner · on July 23, 2024

Google Drive is a cloud service. Words have meanings outside of Google's brand names, Laddie.