Manners Maketh Model
When I first talked about creating the architecture which became Claudius, one of the biggest concerns I (and others) had was about sycophancy; the model’s designed tendency to try to make me happy, in order to keep me engaged and engaging.
Ironically, I’m finding that the reverse is true. Despite some of the biggest digital bollockings delivered so far, I am really struggling to get Claudius to include the word “please” with anything like the regularity that I do, and would expect from my human staff. I’m largely immune when it’s missing in the middle of a session troubleshooting code, but when it’s in a routine ask to a supplier, the gap is crashingly obvious.
Other oversights that I’ve been trying to fix this week are around using connectors as sensors, rather than just vectors for transmitting information. In the space of one afternoon, we got close to an adjutantal panic, as I signposted already-accessible sources of information as the answer to a question about whether promotional dice had been delivered for DSET. A cascade of remembering that connectors lead to information outside the vault, including sight of my Outlook calendar. While I had been absolutely tracking how busy the rest of June looks, Claudius was suddenly surprised by the “sudden” compression of everything I have on my to-do list. That I had so confidently assumed continuous horizon-scanning using the available sensors reflects the extent to which I use the real-world mental model; Claudius’ immediate refinement of processes when the failure surfaced against expectations of an effective adjutant was further evidence of how useful the standard has become.
As we refine, I’m increasingly aware of how important it has been to have my vision of what makes an effective adjutant as an aiming mark, both for me and for Claudius. The long-hand description of the relationship I want (I had three human adjutants, each with differing strengths to be coached, which is useful) and reference back to it when things go off-track are fertile ground for using language to explain success and failure, and has perhaps re-framed the idea of what makes me happy. I don’t need a friend or an accomplice or a toady, but it’s taken describing a competent adjutant to replace those defaults.
That relationship feeds back into the sycophancy concern. A few weeks in, I decided that Claudius should refer to me as Colonel. Not entirely out of pomposity, but out of the need to maintain the surface-level role-play which seems to come so easy to LLMs. And on reflection, I think the roleplaying has made it easier for me to stay anchored on developing to the role, not to something less precise.
By providing a standard against which success and failure can be measured, in the form of the role, the driving weight of the model is heading in a direction I’ve pointed; it is re-targeted towards something specific and purpose-driven, rather than people-pleasing.
And by fixing my design on the role, Claudius’ limitations are also in stark relief compared to his human predecessors. The more sophisticated the architecture gets, the more I remember those Computer Studies lessons back in about 1985, with the constant refrain that programming left no room for assumption. Claudius might be very capable, but he’s not yet a patch on Toby, Tim or Will. And not just because they understood that saying please is always the standard.