Surely I must've messed something up in my benchmarking. Changing the dummy values back to 0x00 though, the throughput reverted back to ~42 MB/sec. Wat.
0
by toggling its data line voltage, and transmits a binary 1
by keeping its data line voltage constant relative to the previous bit. This scheme (called NRZS) presents a problem when there's a long stream of binary 1
s: without additional machinery, there would be no transitions on the data line to synchronize the clock of the host and peripheral.
1
values are transmitted in a row. This contrived transition is ignored by the receiver and therefore doesn't result in a 0
in the data stream. The cost of this contrived transition is that USB transfer times have a data-dependency: as I accidentally stumbled on, a stream of binary 1
values maximizes bit-stuffing and results in a markedly lower throughput.
This doesn't affect real-world data much, since most real data doesn't contain long streams of 1
s, but nonetheless I was surprised to learn that USB's throughput depends on the data being transmitted.