Working with one of our software engineering teams today, I was reminded of some principles of modeling that I have come to take for granted. But this topic I’m writing about in this post is something that took me a while to learn, and my hope is that at least one other person will find this useful.
When modeling a domain model, data model, or any other data structure representing information for the real world, there are an infinite number of possibilities, and it is up to the software designer to choose the structure for a data model. I’ll show two ways to model the same data in a real scenario.
Maiden Name Modeling
My nickname for this technique is Maiden Name Modeling. This is because of the best example. Here is the requirement:
A congressional legislator needs a way to track contacts. These contacts are typically constituents, but sometimes they are donors, judges, etc. An application built on this data model will allow office clerks to maintain contacts in the legislator’s jurisdiction. It will also allow the lookup and updating of information and notes on the contact. Many times, a person will be a contact for many legislators, but the information differs a bit from legislator to legislator. For instance, the contact may be a business, but a different business location or phone number is different for the legislator.
Sometimes a client won’t know how to describe the data characteristics. And in the age where there are many many database tables containing information about “people”, we modelers need to have some tools to decide what structure to use in what scenario.
Question to ask: Here is a scenario: Amy Smith is a contact for legislator Bob Parker. She gets married and becomes Amy Pumpels. She then reaches out to another legislator Sammy Berkins and gets entered into the database as one of his contacts. Should her name and other information automatically be overwritten in the record for Bob Parker?
If the answer is “no”, then the maiden name model is the most appropriate for the scenario. Even though the same person is represented as a contact for the two legislators, it is appropriate for two independent records to be used. This is because there is no business relationship between the two concepts. They are completely independent. In other words if the person of “Amy Smith” disappeared from Bob Barker’s contact list, Bob would be upset. He would be searching for this person, and Amy Pumpels would be quietly hiding the fact that “Smith” has been deleted from the database.
Here is a diagram of this model.
Master Name Model
Another way to represent the same type of data is with a master name model. You might have heard of master name indexes that seek to de-duplicate data for people of all sorts so that there is one place in the company to keep track of names, addresses, and phone number, etc. This is useful in many scenarios. Here is a way to understand if this structure is more appropriate to the situation.
Question to ask: Here is a scenario: Amy Smith is a contact for legislator Bob Parker. She gets married and becomes Amy Pumpels. She then reaches out to another legislator Sammy Berkins and gets entered into the database as one of his contacts. Should her name and other information automatically be overwritten in the record for Bob Parker?
If the answer is that Amy Smith should no longer exist in any legislator’s contact list, then this is a tip-off. A UI features that might accompany this model is a screen that selects an existing contact and adds a Type and Notes. In this scenario, the user will maintain a shared group of Contacts, and they will be attached to a Legislator along with adding a Type and Notes specific to the relationship. Here is what it looks like.
Danger of many-to-many relationships
Many-to-many relationships have always been hard to manage because of the ownership issue: what object owns the relationship? For the database, there is no concept of ownership. In the database, we just store the current state and structure of the data – there are no hints around how it is used. Any application using and modifying the data must establish usage constraints in order to present an understandable records-management paradigm.
We do this by eliminating many-to-many scenarios in the application: in the object model. In the above diagram, you see that Legislator has a one-to-many with LegislatorContact. Then LegislatorContact has a many-to-one relationship with Contact. This is important: Contact has no relationship with Legislator or LegislatorContact. And LegislatorContact has no relationship with Legislator. In the object model, we do not represent these possible relationships in order to make the application code simple and consistent. Through this modeling, we ensure that application code uses these objects in only one manner.
In domain-driven design terms, Legislator and Contact are aggregate roots, and LegislatorContact is a type belonging to the Legislator aggregate and can only be accessed through a Legislator. With domain-driven design, we constrain the model will rules that make things simpler by taking away possible usage scenarios. For instance, it’s ok for a subordinate member of an aggregate to have a dependency on another aggregate root only, but not classes owned by that aggregate root. And it’s ok for an aggregate root to directly depend on another aggregate root, but it is not ok for an aggregate root like Contact to directly have a dependency on a subordinate type of the Legislator aggregate.
With these modeling constraints, we eliminate the many-to-many concept that is possible from the data in the application so that application code can be drastically simpler and one-way.
Conclusion
There is no “one way” to model data or objects. I hope that this post has helped with one common decision point that has occurred over and over in my career. I would love to have your comments. Have you encountered a decision point similar to this?