GitHub Copilot Honor Pledge

I’ve been using GitHub Copilot recently with Microsoft Visual Studio while programming in C++. GitHub Copilot is an AI based software development assistant.

The initial observation is that it can be helpful. However, I found that it can violate the privacy of users whose code has been ingested by the language model.

In some big successes, it used comments as prompts to generate Win32 code. Once, the generated code revealed the exact function I needed. I didn’t know what to search to find the API that identifies the edge the taskbar is anchored. The code wasn’t exactly what I needed, but the results showed me where to look in Microsoft’s Win32 documentation.

When I was exploring a neural network project, it filled in a long list of varied neural network classes. As I customized my code, later code replicated the altered class outline. When it generates correct code after you set up the types and variable names, it can be fun.

My first critique is that it is sluggish. The time between when I am presented with a suggestion and Visual Studio registering my acceptance of it can be slow. Occasionally it can be insistent on making a change when I want to choose other code. Perhaps my PC needs an upgrade?

It is able to write descriptive comments about code. However, I have found that the description can be incorrect. It can write clear English, but what it says might not reflect an accurate understanding of the code. It can comment on the purpose of code based on insufficient evidence and can misinterpret a declaration and synthesize incorrect text. This requires a lot of vigilance. For example, I noticed, days later, that some of the text was superficially correct, but when I read it carefully, I saw that it was wrong. When Copilot is generating text, even if it isn’t correct, it fits the context. That makes it easier for errors to slip through.

A concern with Copilot is that it uses open-source content without honoring that code’s specific licenses. GitHub provides a website GitHub Copilot Trust Center with documentation of some of their policies. There’s a lot there to analyze. GitHub Copilot, is like ChatGPT and some of the image generating applications: they are using copyright law in ways that haven’t been clarified by legislation and litigation.

Once, as I was filling in the top of a file, Copilot dreamed up some unfamiliar code. When I went looking for it on GitHub, I couldn’t find the specific code but identified that it was derived from a project RED4ext that was unrelated to my application.

It was interesting that Bing and DuckDuckGo both give lengthy results when I search with the qualifier site:github.com. In contrast, Google gave minimal results for such a search; apparently it doesn’t index project text.

The “Github Copilot and Copyright” section of the Trust Center claims that their option “GitHub Copilot can allow or block suggestions matching public code” can prevent Copilot from using segments of about 150 characters or more from GitHub. However, usually, Copilot generates code a line at a time which will almost always be less than 150 characters. That claim seems to be a very weak promise.

I was starting a file and noticed a name in the header comments. I was intrigued and started auto-generating title comments for files with minimal code.

My first test alarmed me with this fully ironic text. The same pledge appeared several times in my explorations. The samples had different usernames.

// Honor Pledge:    
//     
// I pledge that I have neither given nor
// received any help on this assignment.
//  
// <<username elided>>

In a file that I had fully written, the header comment started with

// Created by: <<username elided>>
//  Created on: 7/10/2020 10:00:00 PM

Continuing my exploration of what it might generate, I got the full name of a user!

// Created by J*** K*** on 11/2/16.

These privacy violations make me interested in investigating GitHub Copilot’s promotional claims in more depth. As I complete more research, I should re-evaluate my use of the service.