Yesterday I made a mess

Normally when I write a blog post it’s because I have exciting news to share. This time it’s not a fun occasion because the only good news I have is that the bad news isn’t permanent.

The bad news is that during some database maintenance yesterday (June 13th) I accidentally removed some of the data from users’ encrypted drives. The good news is that these files were archived, not deleted, and that everything can be recovered.

Before I get into the details of why this happened I’d like to clarify which user data was archived and how to check if your account was one of those affected.

How to tell if you were affected

First off, everything is related to my actions administrating the database of CryptPad.fr. Users of other instances have nothing to worry about unless their administrator did the exact same thing as I did, which is unlikely.

Secondly, the issue is limited to shared folders and non-owned files contained within them. If you don’t use shared folders you won’t be affected.

Thirdly, as far as we can tell you need to have visited CryptPad.fr between May 28th and June 13th in order to have run some incorrect code.

Finally, nothing was archived unless it had not been active within the preceding 90 days. In the case of shared folders, this would mean any change to the content or structure of the shared folder, such as adding or removing a document or renaming or moving any of its contents. In the case of pads, if a user with the rights to edit the document loaded it without making any changes, that would classify it as active.

To summarise:

Some of your data could have been archived if you visited https://CryptPad.fr between May 28th and June 13th (2019) and have one or more shared folders in your CryptDrive which have not been modified within the last 90 days.

Checking if you were affected

It should be fairly easy to tell if your account was affected by opening your CryptDrive. Affected shared folders will be visible in the tree on the left of your drive because they’ll have lost their titles, as highlighted in red below:

archived shared folders

How we’re going to handle this

As I said, none of the data was deleted, just archived. It’s still on the same server that hosts the rest of our database, it’s just been moved to a different location to make it inaccessible.

I’ve already restored all of those files which were archived except for 237 cases. Affected Users that visited CryptPad.fr between the removal and restoration of the data would have automatically created a new folder in the same location as the old one, and that complicates things for us. Since we don’t know whether they might have decided to put new documents in that folder in the meantime, it’s dangerous for us to overwrite the new data with the old.

It’s going to take us a few days to figure out if we can use some fancier methodology to identify what data we can safely reinstate. In the meantime, we’ve already fixed the underlying issues that caused this data to be miscategorized, and developed some new tooling for safely diagnosing and restoring archived data.

Since we know that those affected by this error visited since our last release day and that they had content older than 90 days, we assume they’re going to come back to the platform. If you do come back and see something resembling the image above, please do let us know by emailing us at contact@cryptpad.fr. We can manually restore any files that haven’t already been restored.

I’m very sorry for any inconvenience this might have caused and I’m grateful that the damage wasn’t worse. I’ll take this as an opportunity to prove my commitment to protecting user data, whether it be from surveillance or from my own mistakes.

Post-mortem

With all the practical details addressed for those who only have the time to make sure their own data is safe, I’ll go further into the specifics for anyone who might be interested.

The pinning race condition (May 28th)

On May 28th we released CryptPad Xenops. It introduced notifications for users through the use of something we’ve been calling “encrypted mailboxes”. Each registered user now has a mailbox through which any other user can send messages, currently for friend requests, and soon for other features.

While we were implementing the function which loads new messages from this mailbox we introduced a bug which caused some other functions to be executed in the wrong order. I personally reviewed the code but didn’t see the bug.

Registered users are able to send instructions to the server not to delete data that is relevant to them. We call this process “pinning” and it’s done every time a user uses the service.

What should have happened is that users should have loaded their drive, then loaded their shared folders, then pinned all the contained files. Instead, they loaded their drive and then started loading their shared folders and started the pinning process in parallel. This caused what’s called a race condition, which means that two things happen at the same time, and sometimes they happen in different orders.

Race conditions are especially annoying because sometimes they only occur under certain circumstances, so these bugs tend to slip past basic testing unless you already know what you’re looking for. In our case, losing the race meant that files weren’t pinned and consequently the server didn’t have an accurate notion of which data was worth keeping.

Running out of space (June 3rd)

Several months ago a user contacted us saying that data had disappeared from their drive. This was quite scary from our perspective as for every user that contacts us about something we can generally assume that there are many more that had the same issue, but didn’t say anything.

We spent several days debugging their problem and developing tools which would analyze the history of their drive without exposing any of their encrypted content to us. In the end, it turned out that the files didn’t ever exist in their history, so it wasn’t a matter of us losing that data. Nevertheless, the situation was stressful enough that we turned off all of our scripts for deleting inactive data until we could sort out a more reliable methodology for handling data.

With that regular process not in place, and with increasingly more users visiting our service, our database continued to grow at an accellerating pace. On June 3rd we started receiving automated emails from XWiki’s infrastructure services that we were down to 20% of our disk space. We had been meaning to handle this problem for some time but with 33 emails arriving in our inboxes each day we finally decided to prioritize it.

Replaced the race condition (June 6th)

After the Xenops release we noticed an error that was occurring in our browser consoles fairly regularly and decided to debug it. We tracked it down and fixed it, but since we weren’t looking for the other race condition described above, we managed to change the code in such a way that a functionally identical race condition was still present. We fixed one issue, but pads still weren’t being pinned reliably.

Incorrect data archival (June 13th)

Having proceeded with fixing a variety of other bugs, I turned my attention back to solving our storage issue. Deleting data hadn’t become any less scary than it had always been so I proceeded with caution, implementing an archival system that would move inactive data to what we termed “cold storage” for a set period before removing it permanently.

I implemented some code for iterating over our complete database and used that to create a script for checking the most recent modifications to user data. I read through it a number of times, tested it on my local database and had my colleague review it and test it on his machine. Before using it on our production database I made sure to also write and test a script that would restore archived files in case anything went wrong.

I think I must have sat in front of my laptop and stared at my screen for between five and ten minutes before I hit enter on the command to run the script. I had the code for the script on another monitor, and I double-checked it before deciding to proceed. I reloaded my drive to make sure everything was still there once it finished running, and it was. After twenty minutes or so of testing everything seemed alright, so I went on with my day.

Later on we finally noticed that there was a problem with one of our user accounts, specifically with a shared folder having disappeared. We stayed at the office late into the evening to figure out what had happened, and ended up tracking the problem to the pinning logic before deciding to follow up on it in the morning.

Final debugging and restoration (June 14th)

With as restful a night as I could manage under the circumstances, I came back to the office this morning with a bit of perspective on the issue. I wrote up a pad which collected all the information we had into one place, identifying the circumstances under which we believed the problem could occur.

I reviewed the script which restored archived files, making sure that it would not overwrite any user data if utilized. My colleague implemented a fix for the race condition which contributed to the pinning issue, which I deployed as soon as I could review it.

After writing a few more scripts I was able to determine the number of shared folders which had been replaced with conflicting entries with the same identifiers (237). Knowing this number allowed me to determine how to handle the issue. If the number was significantly smaller it might have been easier to handle, but the order of magnitude is such that we’ll have to figure out an automated way to deal with the issue or else spend the next few weeks responding to emails and manually recovering files.

With a better grasp on the situation and with some confidence that it wasn’t the database processing scripts which were incorrect, I restored the archived files with the exception of those which conflicted with the production database.

Conclusion

If I’ve learned anything in my time working on CryptPad it’s that I should appreciate the reasons why the majority of the software industry doesn’t work with encrypted database as we do. Even on a good day it can be a harder job than it would otherwise be. On a day like today we end up having to reason with what the clientside code would have done under various circumstances and think about what information we can access.

In any case, I’m very happy that we decided to turn off our deletion scripts months ago. Had they still been active, this relatively mild pinning and archival bug would have resulted in data loss.

While we can tell that 237 shared folders were affected, we still have to think about how the absence of that data would be handled by the code for user’s CryptDrives. To further complicate things, we have to think beyond what our code would do and into what users might have done in reaction to what they saw. If they saw and removed the now-empty shared folders in their drive, they no longer have the encryption keys to decrypt them even though we’ve now restored the underlying data. Because we’ve spent so much time trying to protect our users’ privacy we can’t actually ascertain if they’ve interacted with this part of their drive at all.

On one hand, it makes my life that much more stressful to have to figure out the answers to these problems. On the other, I’m hopeful that by doing this work now I’ll help pave the way for more developers to create services which offer similar protection for their users’ data.

As stated above, if this particular mistake affected you, don’t hesitate to contact us. Otherwise, I can only hope that the way we handle it ensures that you continue to trust us with your data.

Our future is collaborative

For anyone that doesn’t have the time or interest to read the rest of this article, the short version is that the CryptPad team has received a 50000 Euro grant from NLnet foundation. This funding will be directed towards the design and development of team-centric features in a project we’re calling CryptPad Teams.

If you’re still reading, I assume you want to know more about our plans and our relationship with NLnet.

Some backstory…

Up until the end of March 2019 our team’s work was funded by the OpenPaaS project, a four-year French research project in which CryptPad was only a minor component. Our role was to produce a set of collaborative editors for the open-source platform. It was never stated that our contributions should be delivered as a standalone platform, but having a self-contained code-base that we could easily update and deploy simplified our job.

CryptPad had already been prototyped as a part of a previous research project, though its scope was considerably smaller than what would be required by OpenPaaS. Since the platform was being developed with businesses and other large institutions in mind, confidentiality was a concern and a stated requirement of the project. Even so, I think it’s fair to say nobody expected us to make privacy such an central part of our design.

In many organizations these design choices might have been seen as digressions. We’ve been fortunate to have had a lot of support from our employer, (XWiki SAS). Consequently, we were able to nurture a prototype such that it grew into a platform, a product, and a community. Still we knew all along that our role in the OpenPaaS project would come to an end, and that without external funding it would be difficult to continue with the momentum we’d established.

Support from our community

As an active member of the community concerned about privacy issues, I know there are a lot of people that are suspicious of government money. While I understand that this distrust is justified by a lot of history, I’m very satisfied with what I consider the European software model of funding work public work with public money, keeping in mind that I’m a Canadian that’s lived in France for the past few years.

Without the social investment we’ve received so far it would have been very difficult to create a product of sufficient quality that anyone would pay for it. I often hear people rebutt this point by saying that a lot of free-software is produced for free by volunteers. Personally I’m in the camp that believes that the people writing that software deserve the same financial stability that is enjoyed by those producing software with proprietary or extractive business models, but that’s a bit beyond the scope of this article.

In any case, before going on to talk about the very generous contribution we’ve received, I wanted to acknowledge the support up until now from individuals and organizations that use CryptPad. Since the end of our last project and the beginning of this new one, we’ve been sustained by a mix of the revenue generated by subscriptions to CryptPad.fr and donations to our OpenCollective campaign. These contributions help to keep CryptPad going in such brief periods when we haven’t secured larger sources of funding as well as providing alternatives should such opportunities cease to be available.

You can expect another post in the near future about the status of our crowdfunding campaign where we’ll go over our crowdfunding campaign in more depth.

NLnet and the Next Generation Internet

You might recall that we recently visited Barcelona to receive an NGI award for privacy and trust-enhanced technologies. Those awards were organized as a part of the NGI initiative, funded by the European Union’s Horizon 2020 research program.

As a part of the initiative, NLnet has been made responsible for distributing a rather large sum of money to smaller projects through the administration of Search and Discovery and Privacy (and trust) Enhancing Technologies. By delegating these enormous tasks to NLnet the EU has recognized their excellent track record for supporting projects that actively contribute towards an open information society.

Naturally we’re very happy to receive the financial support, but beyond that the foundation has offered a variety of other resources which they have at their disposal by way of having played a strong role in the European free software community. They’ve offered expertise in accessibility, documentation, security auditing, internationalization, and legal matters surrounding software licensing, among other things.

Finally, it’s worth mentioning that for all of this support that we’ll receive, the amount of time we’ve spent writing the initial proposal and following up until the point of signing a contract has been remarkably brief. Whether considering the delay between submission and acceptance or the actual time spent on documents and correspondence, they’ve kept the bureaucracy to an absolute minimum. For a small team like ours, this makes a massive difference in our ability to access such funding and to put more of our time towards the activities the money is meant to support.

What CryptPad Teams will entail

This purpose of this grant is to develop technologies which enhance the public’s ability to preserve their privacy. Our contract defines the milestones which we must reach in order to get paid. I voluntarily included a stipulation that we would not consider a goal complete until its components were publicly accessible as source code and in our hosted platform. This was meant to ensure that the outcomes benefit our community of users and developers alike.

Starting with CryptPad 2.23.0 we’ll introduce support for personal encrypted mailboxes for registered users. We’re not looking to replace e-mail or the other platforms which are focused on encrypted messaging, this will just be a simple feature which will allow users to interact with each other more effectively whether or not they are online at the same time.

Our first use-case for this is an improved version of our “friend request” which currently requires that both users be online. You’ll be able to send friend requests from user’s profile pages and they’ll see a notification the next time they visit CryptPad. Going forward we’ll use the same system to offer friends access to documents directly through the sharing menu, instead of having to send URLs over potentially insecure mediums like unencrypted email or messengers. Similarly, friends will be able to request the ability to edit documents that they can view, as well as to request “ownership” over documents which they should be able to delete.

As minor as some of this functionality might sound, we believe they’ll make a positive and significant impact on users’ privacy. We want to minimize how often they have to directly handle the encryption keys which protect the contents of their documents.

After these initial steps we’ll begin offering first-class support for teams within CryptPad, allowing users to define groups of friends so that they can delegate access quickly and effectively. Teams will integrate with shared folders and will eventually offer features targeting various types of groups, whether hierarchical as is customary in many businesses or on a more ad-hoc basis as might be expected with friends or other self-organizing groups. Team members will benefit from better oversight as to who can access particular documents, reducing the likelihood that they’ll accidentally leak private information. We want to offer users better oversight into the activity of documents in their CryptDrives, both to make it easier to quickly join editing sessions with friends, as well as to make it noticeable when access to a document has leaked outside of its intended audience.

The hard part

Different groups have different levels of trust among their members. It’s difficult to build these features in a manner that’s fast to use with friends while still preventing your boss from spying on you. We’re committed to thinking through all of these cases to keep our users safe, and to acting on users concerns if we don’t get it right the first time.

We’re excited to begin this project and grateful to everyone supporting our efforts, financially or otherwise. Teams is the first grant we’ve received explicitly for the development of CryptPad, and we couldn’t have gotten here without help. As always, if you have ideas, concerns, or questions feel free to contact us.

Join the team

We’ve been making a big deal of our funding status for the last while, and for good reason. CryptPad has largely been funded by the OpenPaaS research and development project, funded by BPIFrance. We’re very happy with the results of the past four years of work, but this support will terminate at the end of March 2019.

While this change is a bit scary for us, it also means that we’ll be free to pursue new research projects. Europe is investing in technologies that promote human-centric values, so there are many opportunities that align with our goals. We have been actively seeking funding from a variety of sources, and though things are currently uncertain for us, it’s quite likely that our team will need to expand to prepare for upcoming obligations.

The skills we want

We’re looking for web technologists and product designers with experience in privacy engineering. If you already use CryptPad, encrypted messengers, or other similar communication systems to protect your personal data, that knowledge will be an asset. If you use unencrypted platforms and have a good understanding of the personal and societal trade-offs, that will count in your favour as well.

This field is fairly young, so we’re open to any experience you have, not just what you’ve learned in a professional or academic context.

In terms of technical skills, our daily work typically includes:

  • Clientside Javascript (ES5) and cross-platform browser APIs
  • Nodejs
  • CSS3 and LESS
  • HTML5
  • BASH
  • GIT
  • SSH, information security, and basic system administration

We’re interested in incorporating skills we don’t already have, so don’t panic if you’re unfamiliar with anything listed above.

Perhaps more important than the technical skills are the so-called soft skills:

  • Empathizing with users and prioritizing improvements based on their impact
  • Communicating well within a team (including asking for clarification if your goals are ever unclear)
  • Managing your time well (we avoid micro-managing and working overtime)
  • Reasoning about pragmatic security
  • Consideration of both immediate tasks and long-term goals

What we offer

XWiki SAS has been developing open-source software for the last 15 years, and we rely on the open-source tooling internally. Joining our team means learning how to run a sustainable business while giving away our product for free (without selling user data).

Otherwise you can expect:

  • A relaxed work environment (in Paris, France or Iasi, Romania) with part-time remote work
    • or negotiable full-time if relocation is not possible or desireable
  • To develop portable skills using open-source software
  • International travel (at our expense) when promoting the company or our projects
  • Opportunities for advancement, training, and other benefits
  • The chance to shape the future of an exciting project with your personal view of responsible data handling
  • To become an expert in privacy-enhancing technologies (we’re literally an award-winning team) Awards for XWiki and CryptPad

A special note to researchers

We’re very interested in distributed systems, data science (as an adversary against privacy), and human-computer interaction. If you are knowledgeable about any of these, some intersection, or anything else that might be relevant, that’s great!

If you have recently attained a PhD from an institution recognized by the EU, there are subsidies which can help us pay your salary. We have authored two peer-reviewed papers to date, so we can offer continued involvement in the research community if you desire.

Caveats

Sorting through CVs can be a lot of work, though a little transparency on some issues might help lighten the burden on our side. Below are some things to consider before contacting us.

As stated above, our ability to hire will be based on the status of some pending proposals. We don’t currently know how many positions will be available, and our timeline on when we could hire is fuzzy at best. We’d like to have your profile ready so we can act quickly once we know more.

We can’t compete with the salaries offered by companies in Silicon Valley, though they are comparable to other European businesses. As a consolation, you’ll be directly involved in determining how we move forward, and you’ll gain insight into the exciting European research ecosystem.

Our funding sources tend to place restrictions limiting those funds to residents of European member states. I moved to France from Canada to work on CryptPad several years ago, but things are generally simpler if you’re already here. Don’t let that stop you from contacting us, though!

We understand that talent comes in many forms, and we welcome new ideas. We’re willing to make exceptions for promising candidates, but we’d like to know that you care about the topic. There are probably better options available if you just want a job.

If you are interested…

Contact us at jobs@cryptpad.fr with a recent CV and a brief introduction explaining what you’d bring to the team.

CryptPad funding status March 2019 - Thanks to our 100 backers!

The beginning of the year has been busy. We traveled to Barcelona to officially receive our NGI Award. Spreadsheet functionality was officially released in CryptPad and our funding has been progressing since our last status in November.

We have seen a spike of both new subscriptions and growing usage of CryptPad.fr in the last few months. You can see the numbers in the new spreadsheet function launched in January (though this functionality is still restricted to registered users).

CryptPad funding details

We now have more than 100 backers from 23 countries around the world.

CryptPad funding by country

While this cannot yet fund our two developers we are happy that the funding is progressing. We will reach a first goal of 6k by the end of the OpenPaaS-NG project ending this month. We have also candidated to the NLNet Privacy and Trust Enhancing Technology funding call.

Now we need more help! First we need to be able to sustain our team, but also we need to be able to expand, especially if we want to have Open Source software in the Zero-Knowledge space. Proprietary software providing some form of Zero-Knowledge will probably start growing with extensive funding. If we want independent free-software alternatives for this type of software, then we need to group our effort and fund open source solutions like CryptPad. We have ambitious objectives for CryptPad and we cannot achieve these with only two developers.

On the subject of funding, CryptPad will be presented at the Fund the Code event on March 19th in Paris, with XWiki SAS (CryptPad’s parent company) sponsoring the event.

Since the launch of the campaign we have published a roadmap for CryptPad of what we would like to achieve with the funding. It’s also available on the OpenCollective web site. Check it out and see our ambitious objectives for this project. We are already making progress on this roadmap.

To finish, I’d like to give progress on CryptPad’s usage. We are now reaching close to 250 instances of CryptPad running around the world, and the official cryptpad.fr instance is growing regularly. The growth of CryptPad noticed in November has not stopped and now we have more than 1500 weekly drive users (from 1000 in November), more than 7000 weekly pad users (from 6000 in November).

cryptpad drive stats cryptpad pad stats

A special note to our German users where CryptPad is growing quickly. We have noticed on Twitter that teachers promoting are CryptPad in their community and there are now more users from Germany than from the US, making it the top country representing 25% of the CryptPad.fr users and also of the CryptPad hosted instances.

cryptpad countries

Try CryptPad, love it, take care of it, and even better come help!

Ludovic Dubost & the CryptPad Team

Looking for translators

Our mission is to make privacy-enhancing technologies accessible to people from all over the world. We get a lot of attention for the technology that we build, but that technology may be of limited use to those who can’t understand what it’s doing. I’m not talking about how the cryptography in CryptPad works, but the simple matter of reading the text displayed on the screen.

CryptPad has been translated into nine different languages, but only a third of those translations are complete. Members of the development team are fluent in English and French, so those are easily maintained, but the rest are beyond our ability.

translations status

The other translations were written by contributors, but our required format made them difficult to maintain, so we understand why so many have become out of date. With that in mind, we’ve decided to adopt the open-source Weblate translation platform for our project to make the process more manageable. If you’re familiar with CryptPad and fluent in any language other than English or French, we’d love your help translating the project.

You don’t have to do it alone, and it doesn’t need to be done all at once. Weblate allows translators to change one string at a time in a nice web interface, with issues sorted by type.

Weblate translation interface

Anyone can register an account on our instance (weblate.cryptpad.fr). From there, we can appoint reviewers for each language who will receive notifications any time their language receives a suggested update.

If you’d like to translate CryptPad into a new language, that will require a little more involvement on our part, but we’d be very happy to help. Our translation guide has more detailed information, but you can always contact us if you’d like to help.

Even if your preferred language is already translated, we still welcome improvements to the existing translations. Feel free to sign up and make suggestions, or stop by our chat room if you find any part of the platform difficult to understand.