Decentralized risks: Hosting information for others comes at a cost

hugi · May 22, 2019 19:01

I was just in Berlin for the Data Terra Nemo conference and also ended up attending a DGOV meet-up. Data Terra Nemo is a conference to bring together the community developing open source protocols and clients for decentralized peer-to-peer applications. A very simplified introduction for those who are unfamiliar with the technology: Decentralized web applications are hosted without traditional servers. Instead, every client is both a host and a client. You can often still access these sites from traditional browsers, thanks to that many of them have been implemented in javascript. This shift allows for a lot interesting applications, and this is the scope of Data Terra Nemo.

Many of the sessions touched upon the current challenges of decentralized technologies.

Blindly hosting information for others comes at a cost

‘Gossip’ protocols like Scuttlebutt allows applications to share a pool of messages between users that are connected to each other. These messages are then interpreted by client applications, which can be social media applications, book recommendation services, chess games or private messages. I wrote a separate post about Scuttlebutt, and I am quite inspired by the community which has grown up around that technology. Largely because of the human-centric values of the core developers it has a unique feel and has attracted a big community for being such a bleeding edge technology. My reading of the role that Scuttlebutt plays in this space is that of the experimental avant-garde, the playground where new and radical ideas can be tested and implemented. Dominic Tarr, original developer of Scuttlebutt, had an interesting reflection on his own modus-operandi that I think has translated into Scuttlebutt itself, which is to “not build the next big thing, but rather build the thing that inspires the next big thing, that way you don’t have to maintain it”. And it turns out that the developers of Scuttlebutt are working on problems that affect a distributed technology at its core.

One of the core elements of Scuttlebutt is that I can host data for other people on the network without being connected to them directly. I can also host data on my own computer that is encrypted communication between users that I am connected to. This is a feature that is very useful in situations where Bob sends a message to Alice, who is in a country where traffic to the outside internet is highly restricted. If Cindy, who has the privilege of a VPN connection, is connected to both Bob and Alice, then Alice will receive the message from Bob as soon as Alice and Cindy connect to the same network. Cindy doesn’t even know that she is carrying a message between Bob and Alice, as both the message and the information about who is talking is encrypted. This is possible because, by default, everyone replicates the entire message stream of their entire network. In fact, many people replicate every message in their networks 2-3 hops aways from themselves.

This creates some challenges. If Hans is in your network and connects to Lars who is a neo-nazi connected to a group of other neo-nazis, you might actually be acting as a server for information that you would really rather not propagate. This is not a hypothetical case. There are already known instances of the Norwegian alt-right using Scuttlebutt as their preferred means of communication. Luckily for people that don’t want to deal with them or host their information, they are still quite an isolated part of the network and prefer to keep it that way. Nevertheless, it only takes that one well connected person on their network-island connects to another well-connected person in the Scuttlebutt mainstream for that isolation to be broken.

Scuttlebutt has tried to solve this with giving users the ability to block accounts they find abusive, but for those replicating data from a large network, they might actually never know who is posting that content. One solution that has been talked about is subscribable block-lists. This way, I could outsource blocking to a person or group I trust to maintain a block-list.

Luckily, the most worrisome case of abusive photographs and other image content is not as big of a problem. Images are only downloaded to your computer when your client actually sees them, making it clear to you that foul content has made it into your stream. You might still host links to those images without knowing, but you are not likely to have images on your hard drive that you have never seen.

Bottom line is that the community of Scuttlebutt is charging head first into the future, working on challenges that many more will be facing as the benefits of these technologies reach more hands. Distributed systems are both democratizing and empowering, and they come with a whole new set of possibilities.

I’d love to hear perspectives from @elch, @zelf and @hendrikpeter on this. Do you worry about this when using Scuttlebutt? How can we make these peculiarities of decentralized technologies visible without scaring people off?

zaunders · May 22, 2019 21:41

I hope you don’t mind if I butt in just a little (pun intended)

I feel a little weary about being an SSB conduit and also a little reckless for not paying more attention to whom I am propagating. The all or nothing block is also a little rough (I know it needs to be rough since it’s trailblazing!). It feels a little unhuman to be fully accepting and propagative of signal or not at all receptive to anything a person is trying to communicate, I hope the future can hold more ambiguity. Also attempting to create circles of trust is something I’ve been thinking a fair bit about in this space too.

I don’t want to feed any type of Scuttlebutt/Holochain rivalry hanging around the decentralized communities, but I hope it could be interesting to hear a little of how these problems are considered in the HC space (AFAIK). There, there are two main parts to handling this kind of stuff, peer-validation and warrants.

Peer-validation is basically an infrastructure to ensure that peers do not publish things to the app (the shared DHT) that break the rules that the users of the app have agreed on, might take a toll on encryption though, not sure how that is going to pan out, I suppose only people that have the keys to decrypt can act as replicators of data in those cases.

Warrants are similar to the concepts of blacklists, as actors one actor deems another malicious in some way a warrant is issued which includes the piece of malicious data signed by the actor that sent it (could be multiple different ways with multiple corresponding warrants). As warrants circulate, the app logic states how to resolve them. Banning? Flagging? Starting a process? All that is also meant to be dependable on the reputations and prior histories of the actors issuing warrents as well as those on the receiving end.

Also, theoretical and only partially implemented at this stage (again AFAIK).

I don’t know how all that jives with what is cooking in the SSB cauldrons but my purpose in life is cross-pollination so I hope it might help in these early days of evolution

hugi · May 22, 2019 23:39

How is this supposed to work? I’m curious. What does “validation” mean in this context?

zaunders · May 23, 2019 20:21

Validation means making sure that entries are entered into the shared space only so long as they follow the rules specified in the application. When somebody enters a public entry to their local chain, it is also going to be published to the DHT. Peers will recieve the entry and pass it through the validation functions declared in the app to make sure they follow them before adding them to their part of the DHT.

Validation could consist of simple stuff like, no images larger than 2MB or the oft quoted no messages longer than 140 characters. But it could also be to make sure that you are not editing some material that you don’t have authorship over or posting in a space where you only have reading rights or something like that.

For mutual credit currency situations, validation in apps will ensure you stay within your credit limits and not allow you to go further positive or negative than your calculated limit is (that could be based on other things like prior transactions or integrated activity from other app spaces).

Potentially it could also be things like checking links against an external list of websites in order to flag them as inflammatory or blocking users from spamming until they have achieved some amount of credibility in the community of users for that app. It is all written on a per-app basis and DHTs are shared only within an app. Apps in this context is just the data layer since UI’s can bridge a whole bunch of back-end “apps” and read/write to many DHTs for rich user experiences.

alberto · May 24, 2019 09:10

Thanks for the clear explanation, @zaunders. And thanks @hugi for explaining what really is in the SSB folder on my laptop.

I was thinking about this, and it dawned on me that the European Union’s General Data Protection Regulation was definitely not written with SSB in mind. The GDPR seems to have been written with centralization in mind. For example, if you treat data, you should have a data protection officer. This is someone who stands watch on the data; she is in charge of enforcing the user’s rights, like having their data deleted and knowing what uses are being made of said data.

But with SSB, this model glitches. In fact, there is no SSB. My right of having my data deleted from SSB is unenforceable, because the data are sitting on people’s hard drives. These people do not even know they have my data on their hard drive! Most of them, like me, will also have an automated backup procedure.

The GDPR gives people a right to sue for misuse of their data. But in the case of something like SSB, there is no one to sue. This might be a feature and not a bug: people are expected to think before they post, and when they post they can never un-post. I do not know, it is so orthogonal to the regulatory frame of mind that I struggle to transpose the model to SSB and its federated ilk.

Don’t get me wrong, I am a big fan of the GDPR. It was designed to limit the power of large centralized corps, and I am already seeing beneficial effects in this sense. But I don’t see how it can provide cover against Norwegian nazis using my laptop as a communication device, or help me keep control of data I share on SSB. Anyone has any ideas?

zelf · May 28, 2019 22:29

hahahahha this had me laughing out loud, I’d missed that he said this! He might be onto something considering beaker browsers new development… (wanted to link but can’t find posts about it at the moment) I’m gonna keep reading now though. I think this might tie into more thoughts down the line…

zelf · May 28, 2019 22:45

This touches upon an aspect I worry about in regards to scuttlebutt as well yet is of a different kind. In your example you mention being faced with data you would rather not partake in, I have found that blocking does indeed hinder most forms of data sharing which I would like to avoid. The one case i have witnessed a lot of discussion around is the same as with facebook/instagram of the “annoying uncle/aunt syndrome”, in which one has a social obligation to not block the individual yet would rather not see the unfiltered spouting on their feed. Facebook/instagram has solved this by simply opening up for “opting out” of seeing their content in your feed. This would be simple to implement on the application layer of SSB for example.

On the other hand, speaking of hops and the spread of data, comparing the amount of privacy (in the sense of who has access to your data) SSB is by far more private than current internet standards and platforms. This ties into the core issue of all kinds of flat-structure tools/platforms used for private communication, that they can easily be used for hiding shady information as well, something one inevitably has to take a stance on.

In regards to designing for communicating privacy/lack-thereof /quirks of the new protocols, security expert Eileen Wagner had a great workshop about this at Radical Networks last year

zelf · May 28, 2019 23:09

This is actually interesting to discuss though! I had a conversation with the an IT guy who had done a lot of research into GDPR lately to ensure his company fit the regulations. What he said he’d found was that GDPR did initially serve it’s purpose of ensuring that European data stayed within Europe, but it had simultaneously opened up a new market for American companies to make money by charging the companies who used their services (such as GDrive) for ensuring that their data would be stored in the companies European servers.

Inherently I don’t think GDPR suits it’s purpose of ensuring the data privacy for it’s “citizens” as it’s an issue rooted in the infrastructure of the default-web rather than how the already faulty system is implemented.

In general, spot on regarding GDPR’s effect on SSB though @alberto

johncoate · May 29, 2019 04:39

Could you say more about this issue rooted in the infrastructure of the default web? How would you describe it?

alberto · May 29, 2019 08:03

That seems a very partial take on the GDPR. Its main effect is that people are waking up to the fact that the cowboy era of data hoarding is over. “Data minimalism” has become a thing (for example, it is a tenet of City of Amsterdam’s digital strategy: a far cry from the alcyon days of the “smart city”). IT folks focus on the costs of compliance, but the bite of the GDPR is that it creates digital rights; puts the liability for infringing those rights on the entities that collect data; and then steps aside and lets the courts do their job. The GDPR has inherently more bite for large corps than for small ones, because class actions are much more of a real risk for them. No one is going to go through the trouble of suing Edgeryders. Facebook, though… that’s another matter.

zelf · May 29, 2019 15:28

Yess! I completely agree, it’s a much needed statement indeed, setting a precedent for the future and targeting the big companies. In reality though it makes it difficult for smaller companies to continue their work as they are reliant on the bigger companies which in turn can profit from this reliance with the rules of GDPR as a backing.

But yes, it’s a much needed statement, if executed in a proper manner is disputable, or if it’s even possible to take action in a positive form when the infrastructure itself directly contradicts personal ownership of data.

This leads into @johncoates question:

Could you say more about this issue rooted in the infrastructure of the default web? How would you describe it?

The infrastructure of the default web is

Centralized, as seen in this image
Inherently distributes the ownership of data away from the users
Relies on middlemen to deliver the data itself which sees all meta-data

With the structure above as a basic foundation of how the https protocol works it is practically impossible to organize for private data where the individual has ownership of the utilization of the data itself since the user can’t control who has access to the data or how the data is stored.

The movement of Distributed / Decentralized webs are all centered around re-organizing this foundational infrastructure, and more, such as in the case of Mesh networks which goes even further and looks at the hardware infrastructure of the internet.

johncoate · May 29, 2019 18:12

It looks like a movement that is gaining in numbers energy and power.

JollyOrc · May 30, 2019 09:01

I think that is why Tim Berners-Lee is doing the Solid thing now. It’s not quite there yet, but if we get this right, it’ll be a nice middle ground where everyone stays in control of their data, but we will still have sort of centralized app & service providers. (Because, let us be frank, no one wants to have to worry about the uptime or safety of their data storage.)

hugi · May 30, 2019 09:19

And another similar initiative is Wireline. I saw their demo recently, and it seemed pretty stable. Apparently they are very close to releasing.

Maybe @leobard has some updates? Last time we talked he was hanging out in a chat channel with Tim Berners-Lee.

mattias · May 30, 2019 21:23

Hosting info for others definitely comes at a cost, both for the host and the guest, considering “there’s no free lunch”. I’m quite interested in Solid, as it feels like a “middle-way” from the mainstream Internet as we know it but with the capacity to give users more power and give them control over their data - especially if self-hosting data. Also with the possibility to doing it in an association, co-op or a company they own or are a member of. Works well with the https://mydata.org framework, would love to see a combination.

alberto · June 11, 2019 09:30

I looked at the blurbs of both Solid and Wireline. The idea has been floating around for quite some time: I remember hearing about it for the first time at an event called Public Services 2.0 in 2009. So, I guess my questions would be:

In a world that normally moves quite fast, what is delaying deployment? Maybe @RobvanKranenburg has some answers here.
What is keeping entities accessing your “pod” or “data wallet” or whatever saving a copy of your data, and then cross-referencing it with whatever else? Technically, of course, they have to copy your data. Legally (at least under the scenario of restrictive data protection regulations) they are supposed to delete them, but… will they? Facebook is rumored to have an “you account” even if you yourself do not have a Facebook account, and never had one. Would this kind of scenario be prevented by Solid/Wireline? Because if it would not, we go back to good old antitrust policy: forget about the tech, just never allow companies to grow too big, break them up, nationalize them, whatever.

Edit: Cory Doctorow seems to share this point of view.

johncoate · June 13, 2019 02:05

Or maybe regulating them in certain ways makes them stronger.

inge · July 11, 2019 14:24

so, just read this in email from the alt-right “social network” Gab:

After three years of work and after being banned multiple times by both App Stores, Gab finally has dozens of mobile apps for our users to choose from. Recently we moved to an open source and decentralized version of Gab that makes your Gab account compatible with a variety of different apps.

You can search both app stores for “Mastodon” “ActivityPub” and “Fediverse.”

Anyway, thought I’d throw it out there. I wonder if they know they’re being used by alt-right and if there’s something they can do against it.

johncoate · July 11, 2019 16:48

I would be surprised if they did not, although I have to admit to the possibility. I remember years ago when so many of us were reveling in our newfound and newly named online communities as if they were known only to we who wanted to use these tools for planetary enlightenment and cooperation. I soon learned that in fact online bulletin boards and, yes, communities had already been going on with real sophistication in the world of hard-core survivalists and white supremacists. And when I dug deeper, I saw that in fact they were out in front of us on using the technologies. Did the makers of the software know about it? Did Ward Christensen, inventor of the first reliable downloading utility, Xmodem, know his work was being used by the KKK? I don’t know. But I can’t see how they could have then, or could now, do something to prevent it being used for what they might see as dark purposes.

inge · July 12, 2019 06:56

prevent maybe not, but this brings this discussion perhaps in a different direction: should we (as society, community) do something about hate speech online. It is a very thin line of course, but this Gizmodo article does show how the internet can be a rabbit hole

America is a country without hate speech laws, one built on the premise that it’s not the government’s job to decide what types of speech should be prohibited. In the internet era, that sort of governance is largely left up to the private companies responsible for the technology powering all our digital communications. As spectacular incidents of hate-based violence draw headlines and the web is flooded with extremist content, there’s been an increasing public pressure for companies to take that responsibility more seriously.

And I did some reporting about this and Dylan Roof in this newsletter for Coda Story:

And while the cyber warriors find their way into our hearts and minds, so have they been able to spread conspiracy theories and cults. Although the response to Donald Trump’s claim that windmills cause cancer has been met with derision, the spread of conspiracy theories online is more worrying. WhatsApp conspiracy theories leading to murder in India, the “Pizzagate” conspiracy, and Q-Anon supporters believing all of Trump’s enemies will be arrested and executed for being murderous child-eating pedophiles.

How it works : Dylan Roof, who murdered nine people, said that after hearing about Trayvon Martin’s death he decided to Google him, finding an abundance of links to “black on white crime.” Radicalization in today’s world, often starts with a simple question online. Our story on HIV denialists in Russia shows how one simple search online can drag people into online groups and forums in which they are bombarded with the conspiracies, finding “like-minded’ new friends and alienating themselves from friends and family - just as the old cults did before.

But of course, censorship is difficult as well. Mainly because it gives governments a tool to censor anything critical of them.