CryptPad Analytics & Privacy - What we can't know, what we must know, what we want to know

CryptPad is a Zero Knowledge cloud application, this means we have designed it such that we do not have any access to the content which is hosted on our server. However, there are other things which we do collect and it is important that privacy-minded users understand what we are collecting and why. There are four types of information:

  • What we can’t know: This is data that CryptPad app encrypts so we will never have access to it
  • What we must see but don’t collect: This is information which we don’t bother to store but because of how the technology works, we necessarily have access to it.
  • What we must know: This is metadata which we cannot help but see because of the way the technology works
  • What we want to know: This is information which we really want to know in order to make CryptPad better every day

We want to know everything about people, we want to know how people use CryptPad, why people use CryptPad and how we can make their experience easier. However, we don’t want to know anything at all about you.

This poses a challenge because we want to collect as much aggregate information as we can in order to make a great web service, but we don’t want to collect data that can be linked in order to tell a story about you.

What we can’t know

There are a few things which the Zero Knowledge design of CryptPad does not allow us to know at all. These include (obviously) your password and the content of your pads, but less obviously, the titles of your pads, the names of the contributors and your username (you can even have the same username as someone else on the system, we won’t know). The types of your pads are also unknown to us though we could make educated guesses by looking at the encrypted data.

It is our promise to you that we will never collect this information.

What we could know but don’t bother to collect

There are also some things which we don’t really want to know but we cannot avoid seeing it anyway. This includes most importantly the IP addresses of people who edited a specific pad. Technically we know your IP address because it’s how you communicate with our server, but most of the actual operations are done using commands sent down a WebSocket. Once the WebSocket is established, we assign you a random ID and this is how you are referenced, what appears in our server logs looks like this:

1
2
3
198.167.222.70 - - [06/Jul/2017:20:47:45 +0200] "GET /pad/ HTTP/1.1"
304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" "-"

Notice there is no pad ID in there, the pad ID is not in the URL so it doesn’t go in the server logs by default.

Compare this with EtherPad:

1
2
3
4
5
IP Address Pad ID
198.167.222.70 - - [06/Jul/2017:11:54:37 -0700] "GET /p/UNWnpczTkq HTTP/1.1"
200 8920 "https://pad.meshwith.me/" "Mozilla/5.0 (Macintosh; Intel Mac OS X
10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109
Safari/537.36"

You cannot verify that we’re not collecting this so best assume that we are.

What we must know

There are some things which we need to know in order for CryptPad to function properly, we need to know which pads are in your drive in order to impose storage limits on logged-in users and to expire pads which nobody cares about. However, we don’t know much about who you are. Since we don’t know your username, to us you are identified by a public signing key, something like this:

YIBzjPr3beuGgfHNglGfo3xq-dquxsj4Bst-ze7mL9A

We know that YIBzjPr3beuGgfHNglGfo3xq-dquxsj4Bst-ze7mL9A has 392 MB of data in their CryptDrive including a pad of some type which has the ID fe382219b10c0396de63d2bab7942390 and an uploaded which we know as ff2fdf9bb99ecc89d29d780780de10efdac14ed15e93b235. One of these pads that they have is actually their drive itself, but we don’t strictly know which one (again, we can take guesses based on the size of the patches). You can find out what your signing key is by looking at in your settings page.

We also know when each pad was last accessed so that we can know to delete pads which are not in anybody’s CryptDrive and have not been opened in a long time.

Why we can’t avoid collecting IP addresses

Being able to know how many different people are using CryptPad is very important to us. One rather rude person decided to try to crash our server by creating 647,533 pads. They didn’t put much thought into their attack because what they were doing was not actually creating pads, but it illustrates the problem that if we don’t know how many different people are using the server, we don’t have any idea whether we are popular or under attack. Worse, we don’t know what features have widespread support vs. which ones are only popular with a few prolific users.

One obvious thought is to simply run the IP addresses through a hash function the way we traditionally hash passwords. However this sadly cannot work because there are only 4.2 billion IPv4 addresses and constructing a rainbow table to get back the original IP addresses would take only about 1 day of computer time. So in the end we simply log the IP addresses and don’t worry about it.

What a pad looks like to us

A pad is stored as a file which represents a sequence of encrypted patches. These patches change the content of the pad from nothing to whatever it becomes in the end. A typical message looks something like this:

1
[0,"69d46337f826c0ecd881be59c119a527","MSG","fe382219b10c0396de63d2bab7942390","51Q...."]

It starts with a zero and then your temporary random ID, then it contains the word MSG and the ID of the pad which it is sent to, this format is exactly the same as what is sent on the wire. Finally it contains the encrypted patch which tells us essentially nothing except it gives us a rough idea of just how big the change was.

Occasionally the client will send a checkpoint, this is a special patch which removes all of the content and then puts it all back again. To us, a checkpoint looks the same as anything else, it is a big ball of encrypted data, except in this case it is flagged as a checkpoint so the server knows it can send only part of the history of the pad instead of all of it. However, they do give us a good idea of how big the pad actually is at that time.

What we collect because we want to know

What we really want to understand is your experience with CryptPad and how we can make that experience better. So therefore we collect quite a number of data-points about where people click and what their browser supports. For example we collect the dimensions of your browser. Not because we want to know who you are but because we want to know that types of browsers we need to support.

1
2
3
4
198.167.222.70 - - [06/Jul/2017:21:26:15 +0200]
"HEAD /common/feedback.html?DIMENSIONS:752x1440=1499369175085 HTTP/1.1" 200 0
"https://cryptpad.fr/settings/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36" "-"

You can see an exhaustive list of things that we collect by checking out the feedback functionality in the CryptPad source code but as of the time of this writing, we are collecting feedback about the following things (usually we just collect the fact that an event occurred, not more).

  • Clicking “upgrade account”
  • Clicking “support cryptpad”
  • Presentation: clicking on “print slides”
  • Registering and logging in
  • Opening your recent pads as an anonymous user
  • Clicking any CKEditor button such as “bold” or “italic”
  • Displaying the drive as icons or as a list
  • Creating and using templates
  • Showing and hiding the userlist or CKEditor menu bar
  • Whether your browser is missing certain important features like Proxy, isArray or localStorage
  • Which type of pad you are using
  • The dimensions of your browser window
  • When you have changed your display name
  • Whether you have migrated your CryptDrive from the legacy format

If you are worried about what we might do with this data, you can disable feedback collection in your settings page. But keep in mind that if you disable it we cannot help but know, because your IP address will be in the tiny minority of addresses which access the site but don’t send feedback messages.

What we can learn from the data

1. People mostly use CryptPad to make a plain old pad

But the code/markdown pad and the CryptDrive are catching up. Unique IPs per pad type

2. Activity has been on a very slow rise but with a few spikes

This chart shows unique IPs per day hitting CryptPad. You can things are relatively flat over time except for a big day in June and then some increased activity in July after the UI improvements were rolled out. Unique IPs per day

3. Browser window dimensions are all over the map

This chart shows bubbles which are bigger depending on how many different IPs report the same browser window dimensions. Tragically it seems that there is no way to predict what aspect ratio a device using CryptPad is going to have. Browser window dimensions

4. Lots of pads are made and then abandoned

The first chart shows in blue the number of pads created each day and the number of pads which become “abandoned” (have not been touched in 2 weeks). This says that perhaps pads are considered ephimeral and not to be used for the long term. Created vs. abandoned pads

Here we can see the evolution of pads which have been accessed within the last day the last week and the last month. There is slow but steady growth in the pads active in the past month. Number of active pads

5. People use CryptPad for a while, then leave

We measured 15,000 IP addresses which came to CryptPad just to look at one pad and then left, but of the 13,000 who stayed longer than that we analyzed the time when they first arrived and the time when they made their last visit. About 630 IP addressses have been continually using CryptPad for all 45 days. Number of IPs continuing to access CryptPad We want to make CryptPad a useful tool for helping people get organized and make their projects succeed. So whenever people decide that CryptPad is not the right answer for them, we care about what went wrong and how we can make it better.

How we analyze this data

We do all of our analysis ourselves, and we don’t share any of this data with Google or other data companies. We’re thankful to Kibana/ElasticSearch and LogStash for making it possible to do in depth analysis on our own computers without resorting to a cloud service.

CryptPad Jackalope - File Upload, PDF and Pictures

Yesterday we released CryptPad v1.9.0 Jackalope, we have some exciting new features which we’ve been working on for a long time. As part of the UCF project we have implemented a Zero Knowledge media-tag in CryptPad for displaying and downloading encrypted files stored in CryptPad. Starting now, you can upload files by clicking the upload button or dragging them into your CryptDrive. You can also view pictures and PDF files in CryptPad and you can drag-and-drop pictures directly into presentations. In the next release we will hopefully be adding drag-and-drop pictures into the pad.

CryptDrive Upload

Filenames

We also made a significant but less visible improvement to the CryptDrive. When you make a new pad in CryptPad, it has a title, which anyone in the pad can change, and it has a filename which it how the pad is shown in your CryptDrive. Because anyone at any time can change the title of a pad, the only way to know the titles of all the pads in your drive is to load each and every one of them which would take a long time. But the filename is your unique way to refer to a pad, it lives only in your CryptDrive and it is the same no matter what title someone gives to the pad.

Now the CryptDrive UI shows only one name for a pad, this name is just the title of the pad at the last time you’d accessed it unless you assign it your own filename.

Slide Preview

When you’re using the CryptPad slide app to make a quick presentation, now you can see your presentation in the righthand pane while you type. Since presentations are written in Markdown, this means you get a live action preview of what your presentation slides are going to look like.

Slide Preview and Drag & Drop

Try it now

Head over to cryptpad.fr and give CryptPad a try !

Building mutually beneficial relationships

People hosting instances of CryptPad should read at least the Changes in CryptPad section

Thanks to Scott Alexander for some of the ethical foundations of this post.


You ever wonder why Open Source software always seems to be slightly harder to use and slightly buggier and slightly less polished than proprietary competitors?

How about this: Why is it that good people who want to make good things somehow end up making evil things for evil corporations which sell them to other good people who would (presumably) rather buy good things.

It’s all about incentives

It’s hard to talk about incentives without sounding like a miserly tool, but if we’re going to hack ourselves out of a situation that nobody really wants to be in, we’re going to need to understand them pretty well.

  • Why is Open Source habitually 90% of the way there ?
  • Why is Facebook more addictive than it is useful ?
  • Why is it that when you get something for free, even from a well funded government program, it’s reliably worse than something you buy?

It’s all about incentives.

In a restaurant, you’re the customer

I love going to restaurants. I have no car and few possessions so restaurants are the way I spend my income. Not only do I love food but I love the relationship which I have with restaurateurs. When I walk into a restaurant, I want to be fed delicious food and they want to be paid, not only that, they want me to be happy so I will return many times and bring my friends. I want them to be happy so they will give me bigger portions and maybe a little dessert on the house. Our incentives are aligned perfectly. We are practically a team.

In a soup kitchen, you’re just a user

It is hard to deny the importance of soup kitchens to the fabric of society. Part of what makes us able to claim to be civilized is the fact that we don’t let people simply die if they’re down on their luck. Soup kitchens, however, are not restaurants. When you walk into a soup kitchen, you are generally greeted kindly but there is a subtle distinction from a restaurant, at a restaurant you’re the customer and at a soup kitchen you’re just a user. Many soup kitchens are organized around religious groups and evangelizing their belief is a significant part of their motivation, but even secular organizations are motivated by some sort of a higher calling.

Open Source is a soup kitchen

I’ve been developing Open Source both professionally and personally for 7 years and I’m going to tell you something that many Open Source developers won’t admit. Open Source software is not made for you. Sometimes Open Source developers are motivated by the Free Software ideology and they imagine their code as transforming the world, sometimes they just want to solve some problem for themselves and they give away the resulting code. Open Source software is almost never developed for the simple purpose of making another person’s life a little easier.

If you aren’t the customer you’re the product

This aphorism has become popular with the rise of ad-tech and social network websites. The phrase invokes an image of free services coming like free grain because you are, in fact, the pig on his way to slaughter. In some way this is true, Silicon Valley business models are becoming disturbingly like human farming.

However, the phrase also invokes an image of an evil entrepreneur plotting to enslave humanity by creating a slick social network. If 1 in 1000 companies is successful then logic implies there must be thousands of evil entrepreneurs running around everywhere. If this is true then where are all of the failed evil plotters? I’ve never met an entrepreneur who was anything less than an aspiring saint.

I think the real reason why social networks become human farms is because people don’t want to pay for development of web services and stuck between a successful human farm and a failing soup kitchen, entrepreneurs begrudgingly choose to farm.

Breaking out

If we’re ever going to stop living in a world of farms and soup kitchens, we’re going to need to get serious about incentives. Part of my intention in starting the CryptPad project is to build something that is not a farm nor a soup kitchen. I want to have a mutually beneficial relationship with every one of CryptPad’s users, including you. I don’t want to be a charity worker beholden to an NGO or a post office clerk drawing a paycheck from the state. I want you to be my boss, I want to obsess about making your life better, I want fair exchange of value and aligned incentives.

Changes in CryptPad

As you may already know, cryptpad.fr now limits your data storage and allows you to buy an account which will raise that limit. The code for limits and accounts is also in the CryptPad codebase and turned on by default. If you are installing CryptPad, you have three choices.

  1. Leave it exactly as it is: People will be limited to 50MB of storage and they will see a Support CryptPad button. In the development time this donated money buys, we will pay special consideration to the needs of CryptPad admins like you.
  2. Share the revenue: If you specify some configuration parameters and send us an email, the donation button will become an Upgrade Account button, allowing them to take a plan with additional storage quota. When people upgrade their account on your server, we will credit you 50% of the revenue earned. This helps us pay the cost of development and helps you pay the cost of hosting.
  3. Disable the donate button: If you do this, we hope you will help CryptPad in some other way such as by taking an on-premises support contract.

If you run a public CryptPad instance, please don’t increase the 50MB per user storage limit. This limit is what makes people subscribe and what pays for CryptPad development. Running a CryptPad instance which offers a “better deal” is effectively using the project against itself.

Finally, new versions of CryptPad always check for new or expired accounts from our account server. We have added a parameter called adminEmail which will be sent along with the domain and version of CryptPad you’re running. This way we can notify you if we’re aware of any a serious problems with your CryptPad instance. We take your privacy seriously and will never sell your email or send you marketing spam. If, however, you want to keep your CryptPad instance completely hidden from us, you can set this parameter to false and it will never query the account server.

Coming next

Our objective is to help you collaborate, stay organized and get things done faster and easier. We want to provide maximum value to you and we want you to provide value to us so that we can continue doing it. As was said in the previous post, the big issues which we are planning to tackle soon are:

  • File upload for PDF and image embedding
  • Text coloring based on the authors of the document
  • Workgroups for team collaboration
  • Zero Knowledge spreadsheets

As always, we will be continuing to put great effort into understanding your problems, how you go about solving them, and how we can make little changes to make CryptPad fit your needs better.

Caleb

CryptPad - use it, love ❤️ it, support it

It’s been another release day in our little team. Today we released CryptPad v1.7.0 (Hodag). The biggest new feature in this version is that when you create a /code/ pad, the default highlighting is in markdown syntax and there it is rendered in realtime while you type. Try it out by making a pad at cryptpad.fr/code.

In this release we also completed something much more important and central to the future of CryptPad. We finished our first version of the payment server which allows you to take a subscription and help support the work that we do.

Starting with this release we are now imposing a 50MB storage limit for our anonymous users and a 3 month expiration of pads which are not stored by a registered user.

Instant collaboration is the vision of CryptPad and we are committed to continuing to provide that and even providing 50MB of persistent storage for anyone who is willing to sign up.

For people who are ready to take the next step, we are now providing subscriptions which will improve how you organize your information while helping CryptPad to grow and improve.

Plans

  1. Personal (5GB storage, 5€/month ex. VAT)
    • This is best for an individual using lots of pads for collaboration and note taking. For the price of a sandwich you can stay organized on all of your devices while also keeping your privacy private.
  2. Standard (20GB storage, 10€/month ex. VAT)
    • For the price of lunch, you can have 20GB of storage, enough for not only pads but also for the soon to launch File Upload which will allow Zero Knowledge storage of files such as pictures and PDF documents. With the Standard plan you can add one more friend for free.
  3. Team (50GB storage, 15€/month ex. VAT)
    • If you’re ready to extend your usage of CryptPad to an entire team, we are ready to help you succeed. With a Team plan you get 50GB of data storage in CryptPad, plenty for files and pads. You also get to add five people to your plan and you get professional support available in English and French.

Our goal is to make the best collaboration tool available while still being unable to sell or leak your content. Help us succeed, helping you stay organized and help show the world that Zero Knowledge Cloud is possible.

For Admins

If you’re hosting your own instance of CryptPad, there are a few things you’ll need to do when you upgrade to Hodag. The limits code is still somewhat of a mess and while we get it tied down, you’ll need to do a bit of work to disable it.

First there is a serverside per-user storage limit defined in config.js You’ll want to set this to a big number like so:

defaultStorageLimit: Number.MAX_SAFE_INTEGER

Then there is customize/application_config.js. If you’re not familiar with /customize/, you can create this directory and then copy application_config.js over from the /customize.dist/ directory so it will not be overwritten. The server will try looking in /customize/ first.

Inside of application_config.js you’ll need to update the enablePinLimit line like so:

enablePinLimit = false;

If you’re using your own CryptPad installation in a business context, please consider contacting sales@cryptpad.fr for an on-premises support contract. You’ll get help with upgrades and early information about security issues.

What’s Next

In the coming months, we’re hoping to roll out text coloring based on the authors of the document as well as file upload for PDF and image embedding. Eventually we plan to add Zero Knowledge spreadsheets and workgroups for team collaboration.

You gotta log in

It’s been two and a half years since the first commit to CryptPad, we no longer have the hideous white and green color-scheme and we’re on our third URL format. More importantly we now have a CryptDrive with folders instead of just remembering a few recent pads in the browser’s local storage.

The success of CryptPad as a tool for organizing and collaboration makes us glad to be working on the technology, but our desire to avoid collecting metadata has lead to an unsustainable situation.

We can’t store data we don’t understand for people we don’t know

There has been a proliferation of pads which are not accessed after a while and we don’t know who made them or even what type of pad they are. We know that a great many of them are “test pads”, if for no other reason, because we made a lot of them. Eventually we will be forced to delete old data but we don’t want to delete anything important.

Starting a few weeks ago we implemented a system called pinning. When you are logged in, your browser tells CryptPad all of the things in your drive. We don’t know what’s in them but we know they’re important so we shouldn’t delete them. Right now you can log in to CryptPad, go to your Settings Page and click the Usage button to see how much data you are pinning.

We recognize many users of CryptPad would like to use it anonymously and we will continue to support anonymous pads, but soon they will begin to be removed from storage after 3 months of inactivity. We’ve also simplified the anonymous CryptDrive because we want to send the message loud and clear that pads in the anonymous drive are not safe from deletion.

So please register and log in, you’ll get 50MB of pinning quota with the full features of CryptDrive and you can be sure that none of your pads will ever be removed from the server.