I’m trying to understand the way Bitcoin stores block data - among other things I want to run some statistics on the block chain / transaction history and check just how anonymous Bitcoin really is. So I went to the source to see how Bitcoin reads/writes block data to file.
(ETA: this is in 0.3.2)
In main.h we have:
Code: (CBlock::WriteToDisk)bool WriteToDisk(bool fWriteTransactions, unsigned int& nFileRet, unsigned int& nBlockPosRet) { // Open history file to append CAutoFile fileout = AppendBlockFile(nFileRet); if (!fileout) return error(“CBlock::WriteToDisk() : AppendBlockFile failed”); if (!fWriteTransactions) fileout.nType |= SER_BLOCKHEADERONLY;
// Write index header
unsigned int nSize = fileout.GetSerializeSize(*this);
fileout << FLATDATA(pchMessageStart) << nSize;
// Write block
nBlockPosRet = ftell(fileout);
if (nBlockPosRet == -1)
return error("CBlock::WriteToDisk() : ftell failed");
fileout << *this;
// Flush stdio buffers and commit to disk before returning
fflush(fileout);
#ifdef WXMSW _commit(_fileno(fileout)); #else fsync(fileno(fileout)); #endif
return true;
}
and
Code: (CBlock::ReadFromDisk)bool ReadFromDisk(unsigned int nFile, unsigned int nBlockPos, bool fReadTransactions=true) { SetNull();
// Open history file to read
CAutoFile filein = OpenBlockFile(nFile, nBlockPos, "rb");
if (!filein)
return error("CBlock::ReadFromDisk() : OpenBlockFile failed");
if (!fReadTransactions)
filein.nType |= SER_BLOCKHEADERONLY;
// Read block
filein >> *this;
// Check the header
if (CBigNum().SetCompact(nBits) > bnProofOfWorkLimit)
return error("CBlock::ReadFromDisk() : nBits errors in block header");
if (GetHash() > CBigNum().SetCompact(nBits).getuint256())
return error("CBlock::ReadFromDisk() : GetHash() errors in block header");
return true;
}
FLATDATA is defined in serialize.h like so: Code: (FLATDATA)// // Wrapper for serializing arrays and POD // There’s a clever template way to make arrays serialize normally, but MSVC6 doesn’t support it // #define FLATDATA(obj) REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj))) class CFlatData { protected: char* pbegin; char* pend; public: CFlatData(void* pbeginIn, void* pendIn) : pbegin((char*)pbeginIn), pend((char*)pendIn) { } char* begin() { return pbegin; } const char* begin() const { return pbegin; } char* end() { return pend; } const char* end() const { return pend; }
unsigned int GetSerializeSize(int, int=0) const
{
return pend - pbegin;
}
template<typename Stream>
void Serialize(Stream& s, int, int=0) const
{
s.write(pbegin, pend - pbegin);
}
template<typename Stream>
void Unserialize(Stream& s, int, int=0)
{
s.read(pbegin, pend - pbegin);
}
};
Now - and I apologize if I’m reading this wrong, this is a little more advanced C/C++ code than I’m used to - as I understand it, the FLATDATA call interprets the raw bytes of a CBlock object as an array (stream??) of characters. The CBlock::WriteToDisk method writes the constant 4-byte message header (0xf9, 0xbe, 0xb4, 0xd9), the size of the CBlock object in bytes, and then the FLATDATA of the CBlock it’s writing to disk - which is just the raw bytes of the CBlock object. So after the header, the data written to file is byte-for-byte the same as the CBlock object represented in memory. Also, if I’m reading it correctly, CBlock::ReadFromFile copies those bytes directly into the space allocated for a CBlock object in memory to re-create the block. Is this correct?
Related question - I am under the impression that the exact way an instance of a C++ class is represented internally not guaranteed under standards; compiling a program with different compilers or different optimization flags can change the order in which member variables are stored in memory, and some debug mode compilers even add a few bytes between member variables to make memory inspection easier. I’m not positive about this, it’s just something I’ve picked up and never seriously questioned.