2 Comments

Personally I'd be quite suspicious that uploading documents to ChatGPT will really provide them with the sheer quantity of data that they need. I'd also be suspicious because Google produces Chromebooks and Microsoft/OpenAI produces Windows (most prevalent/common consumer, education, and commercial OSes) and "accidentally" help themselves to training data, possibly under the guise of "scanning for malware". They're closed source, too, so it's not like anyone would have the ability to audit.

I fully expect that within the decade there will be some huge breach at one or both of these companies because a vault of "training data" will have been exposed, along with PDFs of everyone's tax returns. I also fully expect nothing to come of this unless it were proven that the secure files of a company of sufficient weight (like the US government itself) were harmed by such an event.

Expand full comment

The 'narrative whack-a-mole' really resonates - I work in the digital wellbeing space (how technology use in all its forms shapes our psychology) and I feel I've been playing this game for years and years now.

In talking about data quality, you've gotten me thinking about the changes in quality of human output (and communication) over the ages - for the worse. If the current and future data sets are going to be based on what we output through our most-used communication channels - social media and texting - we're in a lot of trouble.

Expand full comment