(context post by jgarzik)

4 messages BitcoinTalk Jeff Garzik, lachesis, andrew, Satoshi Nakamoto July 30, 2010 — August 2, 2010

FYI, it is pointless to make a packet smaller than 60 bytes — the minimum size of an Ethernet packet. Packets are padded up to 60 bytes, if they are smaller.

lachesis July 31, 2010 Source · Permalink

[Deleted] Quote from: martin on July 30, 2010, 11:37:59 AM

The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is over half the size of the hand built layout!

The “0x00” groups each represent one byte. The length of the standard version packet is 87 bytes plus 20 for the header. The header could be massively optimized as well: Code:message start “magic bytes” - 0xF9 0xBE 0xB4 0xD9 command - name of command, 0 padded to 12 bytes “version\0\0\0\0\0” size - 4 byte int checksum (absent for messages without data and version messages) - 4 bytes Obviously using proto buffers here, while absolutely a breaking change, would save a fair bit of space, especially because the “I’ve created a transaction” packet has the name “tx” meaning that there’s at least 10 bytes of overhead in every one of those packets.

andrew August 2, 2010 Source · Permalink

Why do you consider it a breaking change? There’s no reason you couldn’t first try with the new protocol and then retry using the old bitcoin serialization technique. Also I think this is a change that should be made sooner rather then later while the BitCoin community is still small. It’s already been a major blocker in making new clients and delaying it is going to hamper bitcoin’s adoption.

The reason I didn’t use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there’s no way to form an input that would do something unexpected.

I hate reinventing the wheel and only resorted to writing my own serialization routines reluctantly.  The serialization format we have is as dead simple and flat as possible.  There is no extra freedom in the way the input stream is formed.  At each point, the next field in the data structure is expected.  The only choices given are those that the receiver is expecting.  There is versioning so upgrades are possible.

CAddress is about the only object with significant reserved space in it.  (about 7 bytes for flags and 12 bytes for possible future IPv6 expansion)

The larger things we have like blocks and transactions can’t be optimized much more for size.  The bulk of their data is hashes and keys and signatures, which are uncompressible.  The serialization overhead is very small, usually 1 byte for size fields.

On Gavin’s idea about an existing P2P broadcast infrastructure, I doubt one exists.  There are few P2P systems that only need broadcast.  There are some libraries like Chord that try to provide a distributed hash table infrastructure, but that’s a huge difficult problem that we don’t need or want.  Those libraries are also much harder to install than ourselves.