Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Hub deliver content to each subscriber with a POST"

It's quite common on some social networks to have several thousands/millions subscribers for some pages/communities/accounts.

Is it really wise to build a publish-subscribe delivery system on top of HTTP? This seems to be a huge overhead.

In the meantime XMPP is already offering similar features (XEP-0060: Publish-Subscribe https://xmpp.org/extensions/xep-0060.html) for more than 10 years. It's implemented in several servers and can handle huge loads without problems (everything is handled in real-time through encrypted TCP sockets accross the network).

We are building social networks on top of XMPP for several years now, you can check Movim (https://movim.eu) and Salut à Toi (https://salut-a-toi.org/) :)



XMPP is an open TCP connection with its own protocol, like MQTT, Stomp, Redis, NATS, and all the *MQs. WebSub is mostly between HTTP servers, where one server can tell another that it wants a specific HTTP URL to be GET’d on a message. Each message requires a new TCP connection. So it will not handle very many messages - but I don’t think that’s the point.


There's no reason to open a new TCP connection for each HTTP request; multiple request per connection have been supported since HTTP 1.0, and since 1.1 it's even the default.


...and then there's even multiplexing in HTTP 2 (multiple http requests at the same time on the same connection)


Is it really wise to build a publish-subscribe delivery system on top of HTTP?

I'm sure there is some use-case where this makes sense, but I agree with you. Probably most people wanting to do large scale pubsub should just be using XMPP, or possibly something like MQTT.


Azure Event Grid already provides "reliable event delivery at massive scale" using HTTP based pub sub: https://azure.microsoft.com/en-us/services/event-grid/


What's the overhead of HTTP vs XMPP? A few headers? Doesn't seem that different.


"a few headers" if 50% of your payload is headers, thats a big deal.

Mind you XMPP isn't all that efficient either. as its all based on XML.


The request and response headers for this HN page were 676 and 591 bytes, respectively, and the biggest ones (e.g. Content-Security-Policy, X-XSS-Protection) are specific to web browsers.

For the kind of content sent over WebSub (generally an Atom or RSS feed with one or more long messages), it's much less than 50%.


I'm inviting you to read this page https://xmpp.org/about/myths.html :)


I didn't say slow, I said efficient. Its hilariously verbose compared to a decent binary protocol. Failing that something based on protobuffers.

In the embedded world a JSON/XML parser eats a tonne of resources.

One could of course use it over satcomm as it says, but its hilariously expensive when you are paying by the byte. But, compared to a massive JSON goop with embedded pictures that twitter uses, its a paragon of speed.


XMPP uses decentralized architecture where communication is asynchronous. XMMP uses client-server model where client Do not talk directly to each other.

On the other hand HTTP is a simple protocol which is synchronous in nature.


> Is it really wise to build a publish-subscribe delivery system on top of HTTP? This seems to be a huge overhead.

I'm only just learning about WebSub tonight, but it looks like a lean, efficient, and fairly minimal protocol to me. What gives you the impression that there will be huge overhead - could you be more specific?

When new content is published to a topic in WebSub, it's delivered with an HTTP POST that will look something like this:

  POST / HTTP/1.1
  Host: foo.com
  Content-Type: application/x-www-form-urlencoded
  Content-Length: 13
  Link: <https://hub.example.com/>; rel="hub"
  Link: <http://example.com/feed>; rel="self"  

  say=Hi&to=Mom
If content is being published to a topic at high volume, then the HTTP connection for each subscription will remain open persistently, meaning that you pay the cost to establish it only once when receiving the first message. (If there aren't enough messages to take advantage of a persistent connection, then efficiency probably doesn't matter that much for the use-case.)

Furthermore, it looks like these messages can be sent using HTTP/2, if client & service support it (which is something that you'd prioritize for cases where efficiency matters). HTTP/2 is a binary protocol and takes advantage of HPACK header compression (RFC 7541). This means that if the same header appears in multiple requests, it will be transmitted very efficiently. Thus WebSub headers that are likely to be the same for all requests across a connection (like Host, Content-Type, and Link) will be transmitted virtually for free.

Even the vanilla HTTP/1.1 request described above seems reasonable though -- certainly not something that strikes me as a cost or efficiency problem -- and the HTTP/2 framing of the content is probably going to be not much longer than the content payload.

Now let's compare to XMPP PubSub. From looking at XEP-0060, an item published over that protocol looks like the following - based on Example 101 in: https://xmpp.org/extensions/xep-0060.html#publisher-publish

  <message from='pubsub.shakespeare.lit' 
  to='francisco@denmark.lit' id='foo'>
    <event xmlns='http://jabber.org/protocol/pubsub#event'>
      <items node='princely_musings'>
        <item id='ae890ac52d0df67ed7cfdf51b644e901'>
          <entry xmlns='...'>
             say hi to mom ...
          </entry>
        </item>
      </items>
    </event>
  </message>
[Edit: changed from example 99 to 101, and elided the part of the content that was Atom-specific to more fairly compare the framings.]

Based on this naive comparison, I don't see a reason to conclude that WebSub will have more overhead than XMPP PubSub. When implemented over HTTP/2 it may be more efficient.


Here you are comparing publishing an Atom post to XMPP and publishing a simple message. It will more look like something like that (even if it's not valid).

  <iq type='set'
      to='pubsub.shakespeare.lit'
      id='publish1'>
    <pubsub xmlns='http://jabber.org/protocol/pubsub'>
      <publish node='mom'>
        <item id='bnd81g37d61f49fgn581'>
          <body>say hi to mom</body>
        </item>
      </publish>
    </pubsub>
  </iq>
But indeed if you want to start to put a bit more metadata in your first example (publication, edition date, id, summary, alternate link) well you'll quickly reach a similar structure (with the XML around).

That is also the power of Pubsub. It is that it gives you the freedom to put what you want in it (it can be Atom posts like in your example, but also stock market tickets that are pushed each 5 sec, some server monitoring logs...). You define your own namespace, write a little parser for it and use the thing into your XMPP Pubsub library :)


This is a proposal for a W3 standard. How many dependencies and upstream configurations do you need to realise XMPP in your data flow?


< "Is it really wise to build a publish-subscribe delivery system on top of HTTP?"

Seems to be working fine for SQS. It all depends on your use-case. For high volume messaging or certain types of messages you might reject WebSub for the same reasons you might reject SQS in favour of AMQP or MQTT etc.


SQS isn't really pub-sub. That'd be more Kinesis.


HTTP/2 presumably cuts out most of the overhead. Multiplexing sockets, header compression, etc.


Not to mention PubSubHubub...


This is the successor to PubSubHubbub. It’s pretty much the same API. https://websub.rocks is a great resource for implementors.


From the page linked:

> WebSub was previously known as PubSubHubbub.


Why is this even supporting HTTP 1.1? And why HTTP at all and not just HTTPS? Encryption should be the default on all protocols going forward. No exceptions.

And I agree, it should use something like a Noise protocol even instead of HTTPS.

http://noiseprotocol.org/

https://github.com/noiseprotocol/noise_spec/wiki/Noise-prope...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: