The fallacy of the always-valid entity

I use domain-driven design, and one of the core patterns in DDD is the entity.  I won’t go into a description of aggregates or aggregate roots, but the entity is a central pattern when implementing domain-driven design.

I often encounter the desire by some developers to create an entity that guards itself against ever becoming invalid

Let’s consider the following example of a UserProfile class:

public class UserProfile
{
public string Name { get; set; }
public Gender? Gender { get; set; }
public DateTime? JoinedDate { get; set; }
public DateTime? LastLogin { get; set; }
}

User Profile(s) are created initially with a name and a gender.  Because we never want this entity to be invalid, we may write some guard clauses in the setters:
 
private string _name;
private Gender? _gender;
public DateTime? JoinedDate { get; set; }
public DateTime? LastLogin { get; set; }

public string Name
{
get { return _name; }
set
{
if (string.IsNullOrEmpty(value))
{
throw new Exception("Name is required");
}
_name = value;
}
}

public Gender? Gender
{
get { return _gender; }
set
{
if (_gender == null)
{
throw new Exception("Gender is required");
}
_gender = value;
}
}


This will ensure that the properties cannot be set to null at any time.  In fact, some UI frameworks will catch these exceptions and take the message and turn it into an user’s error message.  Required property validation and the accompanying error messages are misplaced here inside the entity.

  • The fact that name is required needs to be context-bound.  When is it invalid?
  • The message should be the responsibility of the presentation layer. 

These are simple things and don’t illustrate my point very strongly.  In real systems, there is complex logic that decides when an entity is valid for particular operations.  There is seldom only one context for being valid or invalid.  For instance, when loading historical data, some genders may be missing.  Should the application blow up when loading data?  What page would get the error message?  When loading historical data, perhaps the user needs to enter a gender when he edits his profile the next time.  The answer is certainly not to fail the query operation.

The dates in the UserProfile class present the opportunity for some more complex business rules.  For instance, should the JoinedDate ever be greater than the LastLogin date?  Probably not.  If this rule is applied, which setter should contain the validation?  Neither.  Even with this simple validation rule, the always-valid entity notion already falls down.

This type of scenario runs rampant in any non-trivial business application.  This type of validation makes up much of the business logic used to consume and validate user input into the system.  This business logic needs to be separated out into other classes.  Let’s consider what it might look like to factor this out.

public interface IValidator<T>
{
ValidationResult Validation(T obj);
}

public class ValidationResult
{
List<string> _errorCodes = new List<string>();
public void AddError(string errorCode)
{
_errorCodes.Add(errorCode);
}

public string[] GetErrors()
{
return _errorCodes.ToArray();
}

public bool IsValid()
{
return _errorCodes.Count > 0;
}
}

First we’ve defined an interface to represent the concept of a validator.  It returns a result that contains zero or more error codes.  It’s the job of the UI to print the message that goes with the particular error.  Below are two rules that implement this interface.

public class NameRequiredRule : IValidator<UserProfile>
{
public ValidationResult Validation(UserProfile obj)
{
var validation = new ValidationResult();
if(string.IsNullOrEmpty(obj.Name))
{
validation.AddError("NAME_REQUIRED");
}

return validation;
}
}

public class LastLoginAfterJoinedDateRule : IValidator<UserProfile>
{
public ValidationResult Validation(UserProfile obj)
{
var validation = new ValidationResult();
if (obj.JoinedDate.GetValueOrDefault() > obj.LastLogin.GetValueOrDefault())
{
validation.AddError("JOINED_DATE_AFTER_LAST_LOGIN");
}

return validation;
}
}

These two rules encapsulate the logic, and further refactoring could have them apply to multiple types with similar rules.  This topic can become quite large, but the lesson to take home is that these business rules (validation rules) should be external to the entity.  Some of the logic might become methods on the entity, with a boolean return type, but in a medium to large application, there will be so many business rules that factoring them into independent classes becomes a maintainability necessity.

It is futile to attempt to keep entities always-valid.  Let bad things happen to them, and then validate.


Trackbacks

Reflective Perspective - Chris Alcock » The Morning Brew #347 Posted on 5.14.2009 at 2:29 AM

Pingback from Reflective Perspective - Chris Alcock » The Morning Brew #347

Validation in Domain Driven Design « Infovark Underground Posted on 5.14.2009 at 9:18 AM

Pingback from Validation in Domain Driven Design « Infovark Underground

[PL] Mentoring DDD: Poprawność Posted on 6.23.2009 at 2:08 PM

Jak w większości systemów budowanych w oparciu o DDD, tak i w naszym natrafiliśmy w końcu na problem

[EN] Mentoring DDD: Validation Posted on 6.24.2009 at 5:32 AM

We recently came across a problem with validation of domain objects. A the begining I told programmers

Comments

Peter Gfader said on 5.13.2009 at 10:02 PM

I like that.

How would you implement function dependent validation logic?

Like:

If we are deleting the entity --> don't validate

If we are Saving the entity --> validate

OR things like

NameRequiredRule should always be validated, but LastLoginAfterJoinedDateRule only on creation of objects?

thx

peter

John Kennison said on 5.13.2009 at 10:28 PM

The CSLA framework has an excellent validation engine built into it's framework that provides similar functionality to that described above.

www.lhotka.net/cslanet/

Andrei Rinea said on 5.14.2009 at 1:22 AM

I was in the same mistake (forcing entities to always be valid via the constructor and the (automatic) properties to have the setter private/protected) but I've seen the errors of the way.

For example when using the ModelBinder in ASP.NET MVC.

Regarding the "should the JoinedDate ever be greater than the LastLogin date" problem I was doing something awkward like having a setter method for both JoinedDate and LastLogin so I could ensure proper timing for validating these two values.

Jeffrey, what's your opinion on IDataErrorInfo?

Andy Hitchman said on 5.14.2009 at 1:56 AM

I feel the best way to encapsulate context specific validation is within an action method on the entity, or a value object that represent the action.

In this way you are effectively validating an atomic state change. Properties should ideally not have setters at all.

Paul Batum said on 5.14.2009 at 3:01 AM

+1 for Andy's suggestion. The solution that feels right to me is to avoid writing setters and instead expose behaviors via methods that enforce the appropriate validation rules.

Ian Nelson said on 5.14.2009 at 3:50 AM

Jeffrey, what's your opinion on using a validation block to do some of this grunt work, such as NHibernate Validator or the MS P&P Validation Block?

AlexanderZZ said on 5.14.2009 at 3:55 AM

Not sure if I agree that you should be allowed to create 'invalid' entities in the first place. If an entity is not valid - is it really that type of entity? Is square with 5 sides still a square?

Entities should be created valid. All change methods should validate that the change is indeed valid.

If this seems too strict think about what the entity is representing. Does a null Gender mean that the person does not have a Gender (not a valid state) or that the Gender is unknown (perfectly valid).

By separating out validation, entities become purely data structures (anaemic model). That's OK, if all you want to do is hold data - but a data structure is not an entity/class!

Think Before Coding said on 5.14.2009 at 6:32 AM

I'm on AlexanderZZ 's side.

Your entity should be valid... there are things that you allow like having an unknown gender, and things that you don't.

In this case the entity is not invalid since your UI will handle this case to propse your user to select a gender on next edit.

But there are things that should never happen in your entity:

If there is no meaning for JoinedDate and LastLoginDate to be in reverse order, it's probably not a good idea to make it two independent properties. Perhaps you should not change the JoinedDate and make it readonly, then you only have to check LastLoginDate. Or you can create a method that take both and validate... There are plenty of options to make it possible and clean.

In this case, you UI validation layer is responsible to submit valid data that will be accepted by your entity.

Often you also have rules to determine what operation can be done according to the entity state [can I do this if the user as no gender ?]. But these are not validation rules even if they follow the same Specification pattern.

Rob Scott said on 5.14.2009 at 6:54 AM

Pretty weak arguments for abandoning the underpinnings of two decades of object-oriented programming. The notion that objects have state and behavior and that classes have invariants are key to making large complex systems work reliably. I spend way too much time helping people fix bugs where the root cause ends up being "gee, that object should never have had those values".

What you're arguing here is that "it's too hard to figure out what the invariants are for my class, so I'll use data structures and call them classes". You argue that in some contexts an object might not have the same invariant as it does in other contexts and propose that the solution is to remove all invariants. I'd argue that what's happened here is that you've failed to recognize different states that your object can be in, and as a result have failed to incorporate them into the invariant definition for the class. Either that, or, as AlexanderZZ suggests, you actually have different classes that you're trying to model using a single class (ShoppingCart versus Order anyone?).

I most often see the type of approach you're recommending when people have been lead down a stepwise illogical path (a seemingly logical set of decisions leading to an illogical result) by their tools. They bind directly to the domain object from the user interface and need to make all of the properties public and settable, or they have their ORM configured to load the object through public setters, or they made wide use of object initializers on their domain objects. The fact that these options were chosen doesn't make valid entities a fallacy, it just makes them hard to do given the other set of design choices (none of which, btw, are usually the best way to use tools like data binding and ORMs).

Greg Young said on 5.14.2009 at 8:29 AM

@Rob

The problem is that he is writing procedural code (note the usage of setters). I guess we can all agree that always validf objects are less valuable in procedural code.

@Jeff its pretty obvious that you have not understood the discussions of always valid objects. I will give you a hint your examples are essentially property buckets where as the people saying that objects should always be valid only expose behavioral interfaces which pretty much nullifies your objects (and allows things like the notification pattern --that thing you use and don't bother naming-- to return multiple errors to a UI.

The other question I would pose to you is how do you rationalize that your entity is valid? Let's propose we now have a SendUserCreationEmailService that takes a UserProfile ... how can we rationalize in that service that Name is not null? Do we check it again? Or more likely ... you just don't bother to check and "hope for the best" you hope that someone bothered to validate it before sending it to you.

Of course using TDD one of the first tests we should be writing is that if I send a customer with a null name that it should raise an error. But once we start writing these kinds of tests over and over again we realize ... "wait if we never allowed name to become null we wouldn't have all of these tests"

You also mention the problem of historical data but you fail to comprehend that there is more than one kind of validity (to start with there is validity for an action vs invariants of the object itself). Invariants are what always need to be valid and if you have historical data that is breaking your invariants then you need to fix it or as we used to say, garbage in garbage out. The general strategy to fixing it is to first fix the data then place the rule on the objects. Of course with this kind of data failure (likely due to lack of rigidity in your system) you are likely left in a GIGO scenario.

I could keep going on with arguments but I can summarize them all with "you just have not understood where the viewpoint comes from".

Cheers,

Greg

Nicholas Piasecki said on 5.14.2009 at 8:32 AM

I don't think your entities used in the UI need to be the same as your model objects. The model objects don't let themselves become invalid. The UI entities that are used for capturing data and building an appropriate model object do let their properties get set in any order and implement IDataErrorInfo in order to not fight databinding. The best way that I heard it described is that validation is like a continuum--on the one hand you have simple field-level validation ("must not be null", "less than 30 characters") and then you have more complex business logic validation ("customers after 2005 can't have preferred status") somewhere in the object model. No matter what you do, the validation isn't going to exist all in one place--it'll get repeated in JavaScript, it'll get repeated in some other UI, it'll appear in the model objects, and it might even appear in the stored procedure--and that's okay, because writing software is hard.

Just my two cents.

Jeffrey Palermo said on 5.14.2009 at 9:29 AM

@Andrei

>> IDataErrorInfo

I don't use this interface. I like to pull lots of my validation out as separate responsibilities.. Also, I don't like using exceptions for validation

Jeffrey Palermo said on 5.14.2009 at 9:29 AM

@Ian,

Using a validation block might be very useful. I haven't in the past, though.

Jeffrey Palermo said on 5.14.2009 at 9:31 AM

@ AlexanderZZ

In my simple example, the entity became just a container, but these entities end up with lots of small methods that do things, especially when they are aggregate roots. Heaping all the validation in them as well just bloats them.

Jeffrey Palermo said on 5.14.2009 at 9:39 AM

@Rob Scott

I'm not sure that throwing exceptions in setters is an underpinning of the industry. I can see how the example is a little small. I could write a book chapter on a full DDD approach to this.

I end up agreeing with the rest of your comment but don't see how it serves as a rebuttal to my post. I'll chalk it up to being a small example that only covers one case. I hope to find some time to write more about this. There are many ways to implement it, but the architectural principle is the same.

Jeffrey Palermo said on 5.14.2009 at 9:45 AM

@Greg,

I'm sure I do not understand _everyone's_ viewpoint. And I never will. And neither will you, but here is mine. It's pretty easy to rip apart this small bit of code. Full implementations in blog posts are incomprehensible. Good arguments. I hope to be able to address them.

Greg Young said on 5.14.2009 at 1:40 PM

@jeffrey

`I don't use this interface. I like to pull lots of my validation out as separate responsibilities..'

Can you explain what you consider the responsibilities of a domain object are if not to maintain its own validity?

Greg

Greg Young said on 5.14.2009 at 2:09 PM

@Jeffrey also let's be clear they aren't *everyone's* viewpoints they are the viewpoints that you are arguing against. My statement was that you obviously don't understand what you are arguing against.

Justin Chase said on 5.14.2009 at 3:05 PM

Thanks Jefferey. Of course this isn't a new idea but its good to see someone independently come to the same conclusions.

And for the nay-sayers out there here's a little question:

Suppose the validity of the state of an object is dependent on the result of a long running process? Such as "is this object unique" where you have to look into a database. Also, suppose that said process is asynchronous?

Furthermore, what if there are multiple rules broken by the same value? Additionally, what if the values in the database are already invalid? You can't NOT create the object, otherwise you won't be able to fix it.

Check out CSLA, being "dirty" and "invalid" are two core aspects of business objects.

Rob Scott said on 5.15.2009 at 6:49 AM

@Justin,

I think Jeffrey is the nay-sayer here :) He is afterall saying that keeping objects in a valid state is a fallacy.

On to your little question (or is 4 or 5?). Invarients are required to be preserved on exit from operations on the object. So, if the object itself is running the operation, it is allowed to be invalid while the operation is running.

However, I don't think the example you point out is this type of situation. In the case of "is this object unique", you aren't asking if the object is valid in the sense of "is the invariant of this object preserved", you're really asking about the invariant of another object, namely the collection of objects. The invariant of that collection may be "doesn't contain duplicate objects". Part of the confusion here is using the word "valid" to mean a two different things (one is strictly about whether the internal state of an object is valid and the other is about whether two objects with same internal state should exist in the system).

As for asynchronous operations, if the object itself is performing the asynchronous operation, the invariant is required to be preserved at the end of the method call that kicks off the operation, and at the end of the "callback" method that processes the results of the asynch call. If it's some other object performing the operation (e.g., the collection or repository in your unique object example, then it's that object's responsibility to preserve its own invariant).

I'm not sure what you mean by "multiple rules broken by the same value", so I can't comment.

What if values in the database are already invalid? Woe unto us :) In the scenario you describe (e.g., the object has to be loaded and fixed by a user), the problem is that we need to recognize that part of the state of the "unreviewed" object is that it is indeed "unreviewed". So the invariant is going to look something like if(IsUnreviewed()) then { Rule1(); Rule2(); } else { Rule1(); Rule2(); ... RuleN()); This is very similar to something like an PurchaseOrder object that could be an "InProgress" state or an "Approved" state and have different validation rules depending upon which state it is in.

Dirty and invalid are certainly aspects of business objects and can have many different definitions depending upon the architecture of the system. In many systems, dirty means has been modified but not saved (but has its invariant preserved). In systems that use "objects" that don't maintain their own invariants but rely on objects to do that for them, there is the additional concept a "modified but unvalidated object", that must be validated at some point. In systems where objects maintain their own invariants, there is no such thing as an invalid (in the invariant violating sense) object.

I put object in quotes above to denote that these "objects" are what many people would call data structures and the paradigm that uses them that way is procedural programming. Not that there's anything wrong that :), but it does represent a different view on what design trade-offs are important. My problem is with calling a different design approach based on a different evaluation of the trade-offs a fallacy.

Rob Scott said on 5.15.2009 at 6:54 AM

Sure which that the whitespace between paragraphs was honored. Sorry about the runon appearance of the previous response! It makes my eyes hurt.

Greg Young said on 5.15.2009 at 7:47 AM

@Justin Chase:

To answer your question re: uniqueness that is not a responsibility of the object. That would be a difference between the object being in a transient and a non-transient state.

per the rest of your questions (like the fact that you can still use the notification pattern for multiple errors), I believe I already answered them but maybe I will put up a blog post. In general the problem is that you are thinking about your problems in procedural ways (not necesarily a bad thing but different rules apply when you think about them in an object oriented way)

p.s. citing CSLA in a discussion about "good object oriented programming" is ridiculous.

Neil Mosafi said on 5.17.2009 at 2:36 PM

Disagree. You could do something like this.

1. Remove the setters

2. Create an immutable value object to store the Gender and Name make sure they can't ever be null

3. Add a constructor which takes an instance of the value object, and a setting method if you want

4. Add a method like RecordJoining(DateTime) which sets the joined date and the login date to the same thing. I presume they can only join once so you will probably want to check they haven't already joined in this method too.

5. Add a method like RecordLogin(DateTime) which sets the login to the value of the parameter and ensures the date is not greater than join date

Now your object has behaviour and it's clearer from the interface what someone can do with it.

As regards the historical data import, I would probably create a separate cleanup and import routine for that, I wouldn't go directly to my domain model. It's tricky one, but I don't think you want to corrupt your domain model's integrity just to satisfy a possible requirement that data might be invalid somewhere

Craig said on 5.18.2009 at 9:46 AM

@Rob and Greg

He's not writing procedural code. He's not saying entities shouldn't' have any behavior at all and just be dumb getter/setters. He's saying that validation is a separate concern and shouldn't be part of the domain objects, and that entities are valid or invalid in different times. Consider a workflow scenario where an object can be saved as draft, submitted, declined, approved, etc.

In the draft status all or almost all properties might be allowed to be null. After all, the user should be able to save the object at any time, without having to fill out all the fields. But for submission or approval, much more of the properties must be filled out. When declining, the approver might be required to fill out comments which might even be in a separate collection.

I've seen the validation rules change like this in many applications.

Greg Young said on 5.19.2009 at 9:08 AM

@Craig

Your example does not discuss invariants of the object but instead discusses validations that are required for an action. These are two very different concepts should be viewed differently. The "always valid" camp is certainly not saying objects must always be in a state that they are valid for any operation (this would make no sense). They are however saying that there is a certain number of invariants for an object that should always be true (as an example that a customer object always has a name). In your workflow case you are just working with objects that have few if any invariants in terms of the business, it is illogical to extrapolate such an experience to all objects though.

Rob Scott said on 5.19.2009 at 5:40 PM

@Chris,

If we're going to debate whether he's writing procedural or object-oriented code, we're going to have to define what we mean by the terms object-oriented and procedural. Objects encapsulate their data behind functions that allow operations on that data. Data structures expose their data and allow external (i.e., non-member) functions to operate on that data. A validation process that has another object's methods inspect the first object's data is by (this) definition procedural. An object-oriented approach would have the object itself responsible for its own validation.

<p>An additional subdivision of object-oriented validation is also relevant. An invariant preserving approach would say that the object should be in a valid state (i.e., have its invariant preserved) after creation and after every public method execution. Another approach (let's call it the validatable object approach for want of a better term), would allow the object to be in invalid states, but would have an IsValid() method that could be called to determine if the object was valid at times of the caller's choosing (say before storing in a database).

<p>Systems can be (and have been built) using each of these three approaches (and hybrids thereof), but these approaches reflect different values and goals and make different trade-offs in dealing with the problems you mention.

<p>For people in the invariant preserving camp, the value is in being able to use any object and know by virtue of the fact that it exists, that it is in a valid state -- no missing values, no invalid relationships between data members, and that any operation on that object will notify the caller if the attempted operation would leave the object in an invalid state or if the operation can't be performed in the object's current state. The invariant preserving camp would argue that this advances the reliability of the system and lessens the burden on the caller of the object because the caller needs to know less about the internals of the object (since the object manages its own state). The trade-off is that the burden may be increased on the person writing the invariant preserving object. Part of the increase in that burden is temporary as it involves learning how to write classes in this manner. The pain soon eases.

<p>If we look at the other two approaches from the point of view of the invariant preserving camp (I'll leave it to a blog post to consider the invariant preserving camp from the viewpoint of the other approaches), the main complaint is that the object's internal state is bleeding all over the application. It can do that in a couple of ways. In the procedural validation approach, the validating code has to be able to dig deeply into the internals of the state of the object to be validated -- thereby giving up the notion of encapsulating state. When we talk of separation of concerns we don't usually mean breaking the notion of encapsulation -- we usually mean reducing the responsibilities of a class by separating out different responsibilities, not by breaking encapsulation of the object's (single) responsibility. The bleeding in the validatable object case has to do not with an outside object breaking the object's encapsulation from a coding point of view, but rather from the point of view of understanding when the object should be made valid and from having to understand what methods can be called at what time which in turn usually requires analyzing the code of the class to see if a call would be valid.

<p>To look at your specific example, of an object that can be in many different states (draft, submitted, declined, ...), I'd point you back to my earlier response where I discussed just that scenario ( "the problem is that we need to recognize that part of the state of the "unreviewed" object is that it is indeed "unreviewed". So the invarian

Rob Scott said on 5.19.2009 at 5:41 PM

finishing the last comment:

To look at your specific example, of an object that can be in many different states (draft, submitted, declined, ...), I'd point you back to my earlier response where I discussed just that scenario ( "the problem is that we need to recognize that part of the state of the "unreviewed" object is that it is indeed "unreviewed". So the invariant is going to look something like if(IsUnreviewed()) then { Rule1(); Rule2(); } else { Rule1(); Rule2(); ... RuleN()); This is very similar to something like an PurchaseOrder object that could be an "InProgress" state or an "Approved" state and have different validation rules depending upon which state it is in.").

Rob Scott said on 5.19.2009 at 6:02 PM

the last two comments should have been for Craig (who the heck is Chris?).

DemonsOut said on 6.09.2009 at 11:16 PM

Something that no one in the "entities must always be valid" camp has yet to address is that throwing exceptions any time a class invariant is violated makes for an awful user experience. Imagine your end user's frustration as they realize that instead of getting a full list of invalid operations in one fell swoop, they will instead be fed one violation at at time (i.e. as they correct the last error and hit "Save" once more).