Decentralizing KUserFeedback funded by Prototype Fund

I am happy to announce that starting today we will start working on a project funded by the Prototype Fund. The goal of this project is to make KUserFeedback store all the data it collects only on users’ devices. KDE will be able to perform distributed analysis of this data while maintaining user privacy.

This will be very helpful for KDE, as they will be able to get answers to questions like who their users are and how they use KDE products, and thus e.g. optimize the user experience.

For KDE users, it will improve the privacy situation. They can contribute with their personal data to the project without compromising their privacy.

For privact, this project serves as a first proof of concept to demonstrate the concept of distributed data analysis.

We will keep you informed about the progress in this thread. The project itself is hosted on gitlab.

4 Likes

Very hyped to get started on this!

1 Like

This is really good news and congratulations to everyone who made this possible and who did a lot of preparatory work. Great job. :+1:

1 Like

We’ve decided to do ~weekly reports here on what we’ve been up to. Might be on a bit of a detailed level, but that’s the cost of transparency in my experience :smiley:

I’ll make the start for this week, since it was the first week of the project. For some background on how it works:

  1. We have six months (so from March 1st until end of August) to get a working prototype / proof of concept for this whole thing.
  2. During this time, @RSto and I will both work ~8 hours per week on this, with @bjoern supporting us with direction, UX and most things unrelated to code, I suppose.
  3. Our general approach is to have something working at all times. At first this will only very vaguely resemble the vision we have for all this, but it’s something.

As for my first report:

  1. I’m gonna start with something I didn’t do: I wanted to make sure we have the plan and roadmap documented in more detail than what we currently have in the project wiki. Didn’t get around to that just yet :sweat_smile:
  2. We set up the repository with an initial version of the client and server. I personally mostly worked on the client, the thing that’s gonna store personal data. It’s just a few lines of code so far, but it’s gonna reach out to the server to get a list of surveys (i.e. data the server is interested in), and then just prints that. So mostly project setup / hello world kind of stuff so far.

But hey, still exciting to start moving this along! I think in the next 1-2 weeks (i.e. 1-2 days for me and @RSto) we’ll have a more meaningful initial version working. And then we can start to dig into the juicy bits.

1 Like

While client and server are concepts, which don’t match perfectly for what we’re developing, this naming is good enough for now.

I focused on the server this week and we got to a point where we enabled communication between server and client. Also building a basic interface to create surveys in a rudimentary way. Nothing more to add to Felix report in that regard.

Looking forward to finishing the setup/ hello world part as @fhd called it and starting with a central piece. The federation.

2 Likes

To give the reports more structure we decided to add three categories done, doing and challenges and since @fhd and I are working together quite closely and to avoid redundancy, we also decided to take turns doing the update.

Done:

  1. We added a first naive storage to the client to store arbitrary data like timestamps for now.
  2. We added the counterparts for the server objects like surveys, queries (basically a question as part of a survey) and responses to these as well as tests for all of those.
  3. @fhd took it on himself and blessed us with some quality of life changes to the building process of the client.

Doing:

  1. Finish the communication flow of client asking for surveys, receiving them and answering to those it is interested in (no federation yet).
  2. Adding commissioners to the surveys on the client and as part of the communication (only answering to KDE surveys)
  3. Separate the client into daemon and UI

Challenges:

For the federated data aggregation we will need to make a concept on how this can be achieved. We will do some research on existing solutions and implement a first naive one ourselves.

2 Likes

Pretty decent progress this week as well!

Quick reminder that you can get a more detailed view on what we’re up to by looking at the repository and the issues, since it’s all public.

Done:

  1. We now have the first round trip working: Clients pull surveys from server, fill out a response with the local data (for the time being only timestamps), and send it back to the server, which stores the results. For now, the clients will respond to all surveys from the commissioner “KDE”, and ignore all others.
  2. The client now runs as a daemon (i.e. consistently in the background) rather than just once.
  3. We’ve got an initial UI (even though all it does is say hello world, but that’s how things start :stuck_out_tongue:)

Doing:

  1. Log all responses a client sends to a server and show them in the UI. For this we will need some initial inter process communication between the client’s daemon and UI. That’ll give the user some overview over what data has been sent to whom.
  2. A whole lot of research. We want to get on top of the current state of Solid, we want to investigate Flower, SecAgg, and a bunch of other relevant projects. As long as it’s open source and feasible, we want to “steal” what we can :slight_smile: And also think about how we can collaborate with these projects that have similar goals to us.

Challenges:

  1. We’re still not entirely sure how the federation mechanism is going to work, so we’ll deep dive into research soon and try some stuff out. If none of that turns out to be promising enough, we do have a pretty good idea for an initial mechanism that we could implement ourselves, but we hope we can save a bit of time by basing it on existing tech.
  2. Not a particularly unexpected challenge, but since it’s the local Easter vacation starting next week, all three of us are on vacation for one week in this time frame. They don’t overlap, so we’ll make some progress each week, but a bit less probably. From the second week of April on, we’re back in full force.
1 Like

Done:

  1. Ongoing Research on Flower and how we could use it to aggregate data
  2. Ongoing Research on Flower C++ SDK
  3. Ongoing Research on SecureAggregation

For the ongoing federated aggregation research I made a thread here How do we want to aggregate the federated data? - #2 by RSto

Doing:

  1. Trying to fully understand the mechanism behind secure aggregation (plus) with it’s benefits and potential drawbacks
  2. Make a demo federated aggregation with flower

Challenges:

  1. Get a working implementation of secure aggregation might be tricky with C++ client and python backend
1 Like

Last week I was out, this week it’s Richard. But there’s been a little progress:

Done:

  1. Various code quality improvements for the client, most notably the addition of clang-tidy, which does some static analysis to help us avoid introducing code quality issues (or memory issues even).
  2. Automated building, linting and testing in GitLab CI for both client and server. This’ll help us keep everything working, and also make it easier for us to find out which particular commit broke the build or a test.
  3. A first crude version of the client-side UI showing all the survey responses the daemon sent to the server so far. Still needs to be displayed in a nicer way, and UI/daemon should communicate via proper IPC instead of accessing the same database, but it’s a first step.

Doing:

  1. Improve how the client-side UI retrieves and shows survey responses.
  2. Get a demo with federated aggregation using flower working.

Challenges:

  1. Figuring out how to best do federated data aggregation is going to need a whole lot more research and experimentation. We’re trying to use an existing approach, and there’s a few out there, but they’re all pretty fresh and thus rough around the edges and non-trivial to integrate/implement.
1 Like

Skipped last week as we were experimenting and had no real accomplishments at that point.

Done:

  1. Conceptualising about our federated standard deviation calculation protocol. We made a wiki page on how we imagine it here.
  2. Implement the conceptualised protocol as an experiment with flower and SecAgg. This was more tricky than I’d have hoped, like it often is with research code rather than production ready software. But it works and could be the basis of our client/ server aggregation communication in the future.

Doing:

  1. Try other analyses in the experiment project with flower and SecAgg.
  2. Improve the client-side UI to be less crude in showing the answered surveys.
  3. Implement the flower/ SecAgg experiments in the codebase.

Challenges:

  1. While the experiments yielded a result, it is still unclear how well it works with the actual C++ client (we got it working with a dummy python client).