This (the copyright scanning policy) isn't new, is it?
What it's scanning for in this case is material it believes to be copyrighted, and restricts features (notably sharing) for content that matches.
Given that copyright law exists, and that Google doesn't like wasting engineering time on legal stuff unless refusing to do it would result in lawsuits, this scanning policy is a fairly low-impact solution that has probably been deemed legally necessary to avoid media company lawsuits. I don't like it, but the alternative is for Google (and Microsoft, and any other cloud storage that allows sharing) to mount an expensive legal effort to try to overturn decades of digital copyright precedent, which is likely to fail.
Or more tactfully, if you're hosting it, we'll assume you're guilty because we don't want to deal with it, and you agreed to a very restrictive ToS that lets us take it down.
Google cannot know if you have the rights for any possible document. If you have a whole book it could be bought from Amazon or downloaded via Torrent.
There can even be the case where you have private conteacts that give you rights that Google cannot know.
I can inly thing that they will look for the low hanging fruit: a folder full of movies, music, and other content.
Not really, otherwise services that host user-generated content, like youtube, wouldn't exist.
What a hosting service needs to provide is a way for the user to flag content + provide an "acceptable" response time.
Another approach, a more proactive approach, is for the content owners to share the digital "finger print" of the IP (movies, music, etc.) with the hosting service, so that the hosting service can scan uploaded files and compare them.
Almost everything is copyrighted. Like most of us here I've given
original writing, code or music to people who've shared it on Google
drive. That material is copyrighted no more or less than anything by
Disney or Sony.
Google doesn't just "scan for copyright violations", it specifically
acts out of fear or leverage to be an unpaid policeman for special
interests, rich and powerful media companies.
I haven't said anything new here, but I do think it's important that we
see arguments built on false standards. Google isn't championing the
law or anything noble and we would do well to be very precise about
choosing words to describe what is happening.
[0] deem: acting by fiat and art without necessary recourse to logic,
law, evidence or consistency
They have for a long time, but it's only an issue if you share a file publicly.
If you just keep files private on your Drive, or share them with specific other accounts but not to anyone with the link, there's no copyright scanning.
All of this makes perfect sense because Google doesn't want people using their Drive accounts for mass sharing pirated content.
While if you're just backing up your CD's and DVD's for personal private use, everything works fine, zero problems.
Or the "simple" solution is sufficient as the vast majority of users won't even use "quick workarounds that fit in a comment on Hacker News".
And if the simple solution that still works most of the time is sufficient to avoid liability in the courts then it's solved the problem they were aiming at.
If the more complex solution costs more in engineering time than the benefit, they're arguably obliged as a public company to not implement it. Often chasing the long tail just isn't worth it.
That might be an issue with rar and 7-zip specifically. RAR at least used to be very common with warez and not very common out of it—might just be a low-effort way of discouraging it.
Not endorsing this, btw, I much prefer dumb pipes.
That doesn't seem to be most people's experience, see sibling posts.
As another data point, I have several manually-uploaded zip and 7z files of my own documents and content, both encrypted and not, and never had a problem.
The majority of my Drive usage by bytes is encrypted backup pack files uploaded via API by Arq Backup or rclone. That's all worked fine for many years.
IIRC I did get a roadblock attaching a zip file with GMail several years ago.
GMail also complained about direct-attaching a locally-built, unsigned exe file to an address that I'd never corresponded with. To be fair this probably should have caused suspicion :)
My guess is that the very minimum they do is cut files in small blobs and analyze each one separately, so that kind of scheme (or adding random junk at the beginning or the middle or zipping it or ...) doesn't work.
Seems like this would corrupt the file. There are plenty of metadata fields you could just put some crap in (or just transpose letters in an existing string so you don't need to change any length markers.
That depends on the container format, and with some container formats on the parser. Any container format designed to be streamable would by definition survive corruption at the end. Provided the player doesn't get too upset if any metadata at the end is corrupted, but e.g. VLC handles such things quite well
As a fun example of "depends on the container format", one trick people used to use for sharing files on image boards (4chan and the like) was to concatenate a rar file and jpg file. I don't remember which order it was, but one of the two used a header (read start to end), and the other used a "footer" (read end to start), so you could use it either way depending on what you opened it as.
I still have a handful of files which are books in PDF format in a RAR file, and simultaneously the book's cover as a jpg.
I remember this too but with jpgs and zips, if I remember, jpgs go first as zips have the metadata at the end of the file (to the best of my knowledge, corrections welcome).
Most media files are likely to tolerate random garbage tacked on to the end of the file. ID3v1 tags are essential proof of that; 128 bytes of garbage at the end that didn't cause any trouble with playback.
I would really hope they would ignore the metadata, when computing the hash, for this very reason. Properly tagging films you download isn't exactly rare.
This makes sense, though. I know I'm not the only one who looked up "Shrek.mp4" on Google and got a literal sea of pirated movies hosted on Google Drive.
it make sense on public content not on private stuff you're not opening to the public.
I think most people have a sense of I can put what I want on my "Cloud" storage especially if it's something you're paying for.
On the other hand if that file is shared publicly, Google might be liable under some jurisdictions IIUC.
I d assume that they check some hashes of the file against a database to check for copyright infringement. If only specific actions are not permitted on the file e.g. sharing it widely, this could seem reasonable?
Curious to learn more, what could be other actions the service provider could take to avoid getting a lawsuit?
What about legal media files? In addition to be a software engineer I'm a DJ... I have near 1 TB of LEGAL music (mostly FLAC or high bitrate music files) on my OneDrive (easier because I use Virtual DJ). Yes I know it's not Google Drive but how they can make a difference between legal and illegal files?
Note: I'm in a process to completely "de-googled, de-microsoft, ..." all my stuff (big self-hosted TrueNAS server with BlackBlaze backup).
For convenience: the linked object is a text comment, plus a screenshot of text,
- "so, google has scanned my recently filed scanned files and said it's a copyright infringement"
- "Bro, tell me your Gemini datasplit?"
> "Google"
> "Your file may violate Google Drive's Terms of Service"
> ""05 - You are always choosing.mp3" contains content that may violate Google Drive's Copyright Infringement policy. Some features related to this file may have been restricted. "
> "Restricted file 05 - You are always choosing.mp3"
- ""
The year of the local cloud is just around the corner. My concern are books I buy from places like pragprog in pdf format. I feel like Google simply doesn't care and would ban on first offense.
Honestly. Does this mean that if someone takes a picture of their willy with Google Photos enabled, it'll censor myself from myself? Where does it end?
What a massive irony and a cruel joke. The headline should really be "Google Drive scans your files for copyright infringement", because this is copyright for me not for thee, at its absolute finest.
I'm not even talking about Gemini's training data.
This is unrelated to GCP. If you're going to be an abrasive tool then you should at the very least have some shred of an idea of what you're talking about, laddie
What it's scanning for in this case is material it believes to be copyrighted, and restricts features (notably sharing) for content that matches.
Given that copyright law exists, and that Google doesn't like wasting engineering time on legal stuff unless refusing to do it would result in lawsuits, this scanning policy is a fairly low-impact solution that has probably been deemed legally necessary to avoid media company lawsuits. I don't like it, but the alternative is for Google (and Microsoft, and any other cloud storage that allows sharing) to mount an expensive legal effort to try to overturn decades of digital copyright precedent, which is likely to fail.