Data Protection and Privacy - Europe and the U.S.

If you had a meeting and Apple CEO Tim Cook gave a speech and he was followed by Sir Tim Berners-Lee, and after the lunch break Facebook's Mark Zuckerberg and Google's Sundar Pichai were on the screen giving video messages, you would consider this to be a pretty high-powered meeting.

That was the lineup for some European data regulators at the 40th International Conference of Data Protection and Privacy Commissioners, held this year in the European Parliament in Brussels.

I saw part of it on a recent 60 Minutes. Tim Cook talked about the "crisis" of "weaponized" personal data. It's not that Apple doesn't collect data on its users, but companies like Facebook and Google rely much more on user data to sell advertising than hardware-based Apple.

The focus in that segment is on Europe where where stricter laws than in the U.S. are already in place. Of course, they affect American companies that operate in Europe, which is essentially all major companies.

Multi-billion dollar fines against Google for anti-competitive behavior re in the news. The European Union enacted the world's most ambitious internet privacy law, the General Data Protection Regulation (GDPR).

Tim Cook said he supports the law, but Jeff Chester, executive director of the Center for Digital Democracy, says that "Americans have no control today about the information that's collected about them every second of their lives." The only exception is some guaranteed privacy on the internet for children under 13, and some specific medical and financial information.

This is an issue that will be even more critical in the next few years. Since GDPR was passed, at least ten other countries and the state of California have adopted similar rules. And Facebook, Twitter, Google, and Amazon now say they could support a U.S. privacy law. Of course, they want input because they want to protect themselves.

 

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World

Data Privacy Law: A Practical Guide

Fifty-Two Thousand Data Points

data abstractionFacebook has had a tough year in the press and with its public face (though its stock is holding up fine). There has been a lot of buzz about hacks and data being stolen and fake news and Senate hearings and general privacy concerns. All of these are legitimate concerns about Facebook and about other social media and e-commerce and financial site too.

But how much does Facebook really know about a user? There is the information you willingly provided when you joined and all the things you have given them by posting and clicking Likes and other interactions. Though that volunteered data is often overlooked by users, there is more concern about data you have not knowingly given them but have access to anyway.

I do believe that Facebook is more focused now on privacy and user experience, it needs that data to be a profitable public company. (Disclaimer: I am a Facebook stockholder - though not in a very big way.)

Facebook is free but, as Mark Zuckerberg had to explain to at least one clueless Senator this past summer, it sells advertising to make a profit. Ad sales are more valuable to companies when they know who they are advertising to, and the more granular that audience can be, the better it is for them and Facebook. It might even be better for you. That is something Google, Amazon, Facebook and many others have been saying for years: If you're going to get ads anyway, wouldn't you rather that they be relevant to your likes and interests?

According to one online post, it you total up what Facebook can know about a user, it comes to roughly 52,000 traits. That comes from three key algorithms. One is DeepText, which looks into data, much of which coming from commercial data brokers. They also use DeepFace, which can identify people in pictures and also suggest that you tag people in a photo.

The third algorithm is FB Learner Flow, which might be the most clever of all. It focuses on the decisions you have yet to make. Using predictive analytics, it decides which ads should be shown to you that you would be likely to click and even purchase a product.

Amazon will allow you to let it send out products before you order them based on your previous orders and usage. This is not difficult to predict. My pharmacy will tell me it is time to reorder a prescription and even process and deliver it without my input. That is not so predictive; my 30 daily pills will run out in 30 days.

When Amazon suggests that I might like a product similar to other things I have bought, it's not very creepy. When I see an ad or suggestion from them about a product or even a topic that I was just searching on Google, THAT is creepy.

Similarly, Facebook might give me an ad or a "sponsored" post at the top of my feed because of my recent activity and the activity of friends that I follow - especially those that I interact with frequently with Likes, shares and comment. 

It would be interesting to see what the feeds look like for some friends of mine who are Facebook lurkers and who rarely post anything and seem to rarely even log into the site. What are they seeing when it comes to ads?

I'm Not a Bot

captcha
This early version of a CAPTCHA uses a nonsense word "smwm" and obscures it from computer interpretation by making it an image, twisting the letters and adding slight background color gradient.


CAPTCHA (/kæp.t??/ is an acronym for "Completely Automated Public Turing Test To Tell Computers and Humans Apart"). It is the general name for a type of challenge–response test used in computing to determine whether or not the user is human. 

You have encountered them when logging into sites. The early versions were scrambled words as images. But they have become more complex. 

I suspect that the acronym was formed with the idea of capture+gotcha. That is especially true of a newer form known as an image identification captcha which may be better at fooling robots, but is also better at fooling and frustrating me.

For example, you may encounter ones asking you to "select all the images with a fire hydrant" in them.  (It could also be automobiles or road signs or...)

capcha

The problem with this type is that the images are small and low quality. On the example shown here I can't tell if there is a fire hydrant hiding in the image. And the captcha will keep giving me new ones if I'm not correct. The result? I give up at trying to use the service.

This user identification procedure has received criticism since it was first introduced in 2003. It certainly has accessibility issues for disabled people. But everyday users also balk at having to use it.

We use a simple version on this blog to try to prevent bots from posting spamming comments. That didn't work very well and we had to shut down commenting. We'll never know how many legitimate comments never were posted because the captcha stopped the commenter.

Do they work? I don't know their effectiveness score, but there approaches to defeating CAPTCHAs. The simplest is to use cheap human labor to recognize them. There are many algorithms and types out there now and some have bugs that have been exploited to allow the attacker to completely bypass the CAPTCHA. Good old AI and machine learning has allowed people to build automated solvers.

Is there a need for this technology? Yes. Anyone with a blog knows that spam comments are a problem. 

no captcha
           The NoCAPTCHA reCAPTCHA

And then there is the "No CAPTCHA reCAPTCHA." In 2013, the updated reCAPTCHA began implementing behavioral analysis of the browser's interactions with the CAPTCHA to predict whether the user was a human or a bot before displaying the captcha, and presenting a "considerably more difficult" captcha in cases where it had reason to think the user might be a bot.

Public Google services started using it the following year. The first issue with its use was that because NoCAPTCHA relies on the use of Google cookies that are at least a few weeks old, reCAPTCHA has become nearly impossible to complete for people who frequently clear their cookies. An improved version introduced in 2017 by Google is called "invisible reCAPTCHA".

We will continue to make ways to block bots and people will continue to make ways to defeat them. A new project, Mailhide, is being developed to protect email addresses on web pages from being harvested by spammers. It converts the address that doesn't allow the bot to see the full email address, so "captcha@gmail.com" becomes "cap...@gmail.com". A human would have to click on it, and solve a CAPTCHA to see the full email address.

Can this be defeated by cheap human labor too? Yes. It's like putting a strong lock on your door. Someone can bust it if they are determined to get in, but you hope to discourage others.