Thursday, July 14, 2016

CBR and base cases

I have some issues with how we're applying Case Based reasoning in the project, but I'm not sure it's solvable in the amount of time we have.

We're assuming both novices and experts are using Case Based reasoning to determine phishing, and I genuinely think that's true, but we're informing our case base with dozens of phishing messages.  Certainly I have seen hundreds of phishing messages and I have reports of dozens from some of the novice users we support, but I wonder if we're shortcutting the system by assuming we can identify what the corpus of phishing messages looks like for a novice from our current data.  I have similar concerns about the fact that we're trying to model a hypothetical novice when the cases are likely deeply tied to personal interaction, but I think it applies even more deeply to what a novice has previously identified as phishing.

I'm not sure, though, that it's possible to construct a model without being deeply invasive of the privacy of a particular subject.  I suspect an even close to accurate case base would absolutely require not just messages categorized as phishing by that particular subject, but messages from their actual inbox.

No comments:

Post a Comment