The topic of validation should be addressed in a broader way

cs224 · **Posted:** Sun Oct 26, 2008 8:27 am

Hello,

I've been working on large wizard style business applications where validation accounts for roughly 30% of the business logic. I am following the topics around Hibernate Validators and JSR 303 Bean Validation for some time. I think the topic of validation should be addressed in a broader way.

I understand the advantages of reducing the scope of JSR 303 in order to get out something useful faster and collect feed-back in order to integrate that information into a version 2. The problem with this approach and validation is that if the validation framework does not address nearly 100% of all the issues you have then it does not make sense to use it. It does not make sense to distribute validation rules across different frameworks.

The core problem is that the data needed to configure validation rules is vital for several other functionality areas of user interface applications across different layers. Most likely the term "validation" itself is too restricted.

Below I'll try to describe the problems that a validation framework should be able to address and I've created a prototype of a validation system that should act as a proof of concept on how things could be handled:
svn co https://jclusterjobs.svn.sourceforge.ne ... ionmanager
The readme file should be able to get you started. I have to apologize that there is not more documentation. i will have to take on another work assignment soon and do not have the time to polish the prototype and its documentation further. Nevertheless I would like to prevent a standard validation framework to be released that will be too restricted to be useful. Only think about the amount of time it took to make out of EJB2 something useful in EJB3.

the overall topic of validation is a core topic in most of todays
applications. in addition validation has to occur at several levels
and most of the time is duplicated across layers, which violates the
DRY (don't repeat yourself) principle. in a web application the user
interface input has to be validated in an incremental way to give the
end-user a quick (fail fast) feed-back about his mistakes.

at that level it is sad that most of the time a user first has to run
into a validation issue to trigger a validation rule to produce some
message on the screen that tells him about his mistake. it would be
preferable to have the meta data that describes the validation rule
available upfront and make the web-application display a information
besides an input field that says for example: "the login name has to be
between 5 and 12 characters long". why let the user first hit an
obstacle before guiding him? why not telling him upfront? the
validation mechanisms that are around today do not allow to query the
meta data used in validation rules and force an implementor to
duplicate the information about such rules on the user interface in
text messages, which breaks the DRY principle.

at the next level, imagine that the web application is only a tiny
front-end on top of a web service. because the web service cannot
trust the data coming from the user interface application (possibly
developed by a different company) the validation has to be done once
again on the entry to the web service. imagine that the web service
requires a data input field that is an enumeration, e.g. the countries
in which people are allowed to live in order to be able to buy goods
at a certain web-site. in the validation rule for that field the meta
info about which countries are allowed is present. but how does the
web-application get to know about the allowed values. the web
application has to display a "select" input field to prevent wrong
inputs up-front? again a case where it would be useful to not only
focus on validation but also make the meta data needed for validation
available for other reasons.

finally at the bottom of the stack is the database, which also needs
to enforce consistency rules. again rules already present in the
web-application and in the business layer need to be repeated.

validation rules also come in different complexity levels. from
simplest to most complex are:
1) the syntax validation rules needed to ensure that the input is
parseable into its basic data type (int, double, string, date, ...)
2) constraints/restrictions on the data type itself, e.g. a numerical
input field can only take values between 0 and 999. no other data
item is needed to verify the validity of that rule. this level can be
compared to the xml schema xsd:restriction element.
3) intra bean validation rules only need to be aware of data inside of
a bean, e.g. if you have a bid object where you have a valid from
and valid until date field. then the valid from should be a date
before the valid until field value.
4) inter bean validation rules are not clearly distinguishable from
intra bean validation rules, because you could always construct a
higher level "container" bean that has the beans that need to
participate in the validation attached to it as properties. but in
general the "navigation paths" for such rules are deep. an example
could be that you have a person object that contains a birth date
and you have an insurance object that contains a "type of
insurance". you could imagine a validation rule that says that a
person over a certain age cannot be insured against the risk of
unemployment any longer.
5) the final level is where you need to validate the user input in all
the levels from above against "context". the above levels only deal
with data coming as input from the user. context is data existing in
the back-ground. some examples would be:
- user profile data: belongs to the logged in user but is not
entered together with the data that should be validated. it was
entered some time ago when the user created his profile.
- client data: imagine you have a web-shop on amazon or yahoo (or
whereever) then you would be the client and set some rules,
e.g. which payment options you want to allow for your users that
shop on your site.
- constraints set by the operating company: e.g. if in the above
example yahoo would not allow your web-shop to use some payment
options or to allow it only to sell goods to customers from
certain countries.
- global constraints per platform: normally these are some legal
constraints, e.g. that you cannot sell drugs on a platform.

most of the time the context data comes from a configuration database
and is dynamic. therefore simplistic static rules like @NotNull or
@CreditCardNumber are not sufficient for most real world
applications. in addition as soon as several fields are involved a
certain order of asking users on a user interface for the data items
have to be kept. if a prerequisite of a field is in error the
dependent fields should be in error, too. data values entered for
one field may influence the validation rules for another field.

another topic is that in some cases some fields are optional but in
other circumstances depending on selections of other ("previous")
fields the values become required or not allowed at all (would be
invisible on the user interface).

the dependencies mentioned above are also relevant if a user moves
forward through a wizard like application (lets call pages that the
user already has visited as "past" and pages that are still to come as
"future"; the page he is currently working on is called "current") and
fills one field after the other, but at a certain point he decides
that he would like to change a value on one of the previous (past)
pages. such a change on one of the fields on a past page may mean that
the fields that were already entered and are now in the future may
become illegal. you can also not rely on the user coming across that
field again if he navigates forward again, because changed fields in
the past may alter the page flow and perhaps the field that became
illegal will never be traversed by the user again, but the data value
was already put into the domain model. the only reliable solution is
to clear fields that depend on a field in the past if that field in
the past changes (you could simply clear all fields in the "future",
but that is inconvenient for a user if he navigates forward to enter
all the fields again he entered already before). in order to do that
you need to have the meta data available that is present already in
the validation infrastructure.

the distinction between "past", "current" and "future" and "required",
"optional" and "invisible" becomes also relevant if you need to
implement incremental validation. imagine you have a web service that
contains all your business logic and you have third party partners
that build web-applications in order to feed your web services. you do
not want your partners to reimplement the validation rules, because
this would cause strong coupling and if you changed your back-end
logic you would need to communicate with all your partners to upgrade
their application and worse you would need to synchronize to a common
agreed date when all the applications should be upgraded. the only
"good" solution here is to implement validation in the back-end alone
and keep the web-application client dumb stupid. nevertheless the web
application needs support from your web services. first of all it
needs to get meta data (as mentioned above for
required/optional/invisible, allowable values for select fields,
information about restrictions to tell the user upfront what is
allowed or not, ...) from the back-end. in addition a user on the
front-end would expect validation to occur after every page submit to
have fail-fast behaviour. therefore the web service back-end cannot
just take the whole bunch of data (which is not available on page 1)
and validate, but it has to cope with incremental validation, to only
apply validation rules to data items that are in the "past". you then
add one field to your input data structure for the web service that
says "complete". if that field is set to "true" all fields are set to
the past and all fields are validated. any fields that are still
missing at that point or are in error are real problems that need to
be reported back to the user.

another topic that may occur to you is that you have several date
fields in your data model that need to be dates in the past and
therefore you have one rule and one error message mnemonic that
translates to "The date field must contain a date in the past". Then
your business department comes along and says that for exactly that single
field on that page that message must be changed to something like
"Birthdates are only allowed to be in the past". Even worse, imagine
you have one data model but you allow your application to be
customized by other business partners. business partner one wants the
error message: "Birthdates are only allowed to be in the past" and
business partner two wants the message: "Birthdates must not be in the
future". you can solve the different business partner problem by
introducing different message resource bundles and the different
messages based on context via the spring libraries
org.springframework.validation.Errors
reject methods that create not only one error mnemonic per error
condition, but several. have a look at:
http://static.springframework.org/sprin ... ation.html
"5.3. Resolving codes to error messages"

in short the key points that need to be addressed in future validation
frameworks besides the basic simple static validation rules are:

- meta data useful in other contexts
- display info about valid values before user runs into validation
problem
- fill web-application select key/values
- clear fields on back navigation
- take care of meta data like
- required/optional/invisible
- dependencies between fields
- incremental validation
- cross bean validation (long range coupling)
- different dynamic context data needed for validation rules
- field error messages configurable for different partners and per
context
- a solution to the above problems should remain manageable and
performant even if you have hundreds of rules and hundreds of
objects.

the core idea on how to solve these issues is to introduce a parallel
data structure of FieldMetaInfo objects that parallel beans and
properties of beans in the domain model. this data structure can do
the book keeping and serve as storage area for the validation meta
data in order to allow other parts of the application to query it. in
addition the long range validation rules should be implemented via a
rule engine. in this prototype jboss rules a.k.a. drools 4.0.7 was
used. in order to transparently add the parallel data structure to the
domain model an aspectj aspect was created. via that approach the
core business application is shielded from the details of
validation. the idea on how to use the meta data in other contexts was
explored via a seam web application "ui_web_seamgen". have a look at
the page "userManagement.xhtml" and the "layout/vedit.xhtml"
template. there is a new jsf tag "jces:validateAll" similar to the seam
"s:validateAll" that makes the validation mechanism available to seam
applications. even ajax4jsf works.

in the current prototype xml schema serves the role to provide the
meta data needed for level 1) and 2) from the above enumeration of
validation rule levels of complexity. in addition (this is not
implemented yet) the xsd:annotation element could be used to configure
additional information like field dependencies or to initialize the
required/optional/invisible property at the start of the
application. at the moment this information is read separately from a
file called "additionalFieldMetaInfo.xml".

xml schema is not the only way to describe the meta data for level 1)
and 2) validations. there is one test case that explores the usage of
hibernate validator annotations. it is called HibernateValidationTest.

in my opinion it is vital to keep any meta data for validation rules
in levels higher than 2 out of the source code! these rules normally
will change depending on business partner or sales channel. therefore
an approach to add that data via annotations is not viable. it also
makes maintainability of rules complicated if the rules are burried
somewhere deep in the code. it is best to have the rules in a single
file that can be given to business departments to verify that the
rules are correct.

believe me that a rule engine is superior over a polymorphism / object
oriented approach. if you really have a lot of complex rules you will
not be able to keep an OO approach under control. on the other hand
the rules in a rule engine are even readable by non technical people
and you have the advantage that all those rules are located at one
place. the performance is also good even for hundreds of rules and
large sets of data!

emmanuel · **Posted:** Mon Oct 27, 2008 4:23 pm

Thanks for your feedback.
I have to admit your post is a bit hard to read because it is very long and ideas are somewhat diffused in the text.

Let me try and answer nevertheless.

A. it seems in some areas you are describing how some subsystems (like UI) should behave when facing some validation (esp your description about properties set in the 'future' and 'past' and the need to clear them). This is fine but not the scope of Bean Validation. However, Bean Validation offers the tools to validate constraints and query metadata about constraints. A framework like Web Bean / Seam / any application framework can / should take advantage of what has been exposed on purpose.
Also note that Red Hat (with the help of other EG members) has defined and proposed an integration approach with JSF 2. We hope to get that included in the spec.

B. You seem to imply that Bean Validation does not export its metadata. We do export metadata and that has been a big goal from day one. Have a look at the EDR1 and at this post http://in.relation.to/Bloggers/Constrai ... tionJSR303 for more informations

C. You seem to imply that Rules engines are to take over the world eventually. I am fairly sympathetic to the idea which, BTW, is shared by my colleague and friend Mark Proctor. That being said, rules do contradict your willingness to expose metadata. And the hard part in integrating a rule engine is to find a nice way to fill up the working memory (including avoiding to load too much data but enough data nevertheless).

To answer more specifically on some of your points

D. Validation rules complexity levels
1. you need to remember that a data coming from a user is first 'converted' (in native types) and then 'validated'. It seems you are mixing the two.

2. See my point B and especially the blog entry.

3 and 4. we have bean level validations for that though I agree that 4 is much better handled by a rule engine (but then see my issues in C).

5. this can be solved most of the time by a combination of the groups feature (see one of the Bean Validation blogs I did) and the fact that a Constraint validation implementation is injected to you by a ConstraintFactory. If the ConstraintFactory is container aware, it can inject the necessary dependencies related to your context and assert on them. A ConstraintFactory is pluggable but I anticipate applications frameworks like Web Bean to implement and plug it to Bean Validation.
While the approach I describe covers most cases, a full implementation of what you are describing is only achievable by a rule engine (see point C).

E. The birthdate problem
I don't see it as being a problem as each message can be overridden (and or externalized to a key) at the constraint declaration level.

Code:

@Past(message="{error.birthdayInFuture}") public Date getBirthday() {}

The specification does support ResourceBundles as well as the MessageResolver mechanism giving ample solutions here.

F. Summary
- meta data useful in other contexts
=> yes see the metadata API in the spec

- display info about valid values before user runs into validation
problem
=> could be done, see the metadata API in the spec

- fill web-application select key/values
=> this is an interesting problem. I was thinking that the constraint implementation can be shared with the web UI generator to expose the right subset.

- clear fields on back navigation
=> an applicatio framework issue

- take care of meta data like
- required/optional/invisible
=> groups

- dependencies between fields
=> bean level validation

- incremental validation
=> check Validator.validatorProperty as well as Validator.validateValue. These APis have been designed with incremental validation in mind and UI integration.

- cross bean validation (long range coupling)
=> see my remark above

- different dynamic context data needed for validation rules
=> see my remark above

- field error messages configurable for different partners and per
context
=> note that the birthday thing is not changing per "context". Nevertheless, the next draft will introduce a way to change or set more than one error message per constraint error. The method isValid now takes a ValidationContext object as a parameter. This context object let's you change / add messages.

- a solution to the above problems should remain manageable and
performant even if you have hundreds of rules and hundreds of
objects.
=> you are telling me that a single file containing all the rules remains manageable, come on :)

Have a look at my recent blogs on Bean Validation and a deep look at the spec. I think you will find that most of your ideas can be implemented already. As far as integrating the rule engine, let's try and discuss that (start a new thread maybe). We can go and prototype that in the Bean Validation RI and see if it flies (I could not find the real meat in your code).

Oh a please, no more 300 pages posts, it's very hard to read and reply to them :)

emmanuel · **Posted:** Mon Oct 27, 2008 4:29 pm

http://opensource.atlassian.com/projects/hibernate/browse/BVAL-44

cs224 · **Posted:** Wed Oct 29, 2008 10:05 am

Hello Emmanuel,

sorry for not being clear enough. I'll try answer to your points and
make things more clear.

-- snip start --
A. it seems in some areas you are describing how some subsystems (like
UI) should behave when facing some validation (esp your description
about properties set in the 'future' and 'past' and the need to clear
them). This is fine but not the scope of Bean Validation.
-- snip end --

Exactly this was one of my concerns, that the scope might be to narrow
:) I think that other subsystems NEED to be taken into account in
order to not force users of the JSR 303 to duplicate data already
present in the validation infrastructure across other subsystems.

I do not know about what is planned for JSF version 2, but the
validation infrastructure in JSF version 1 is a joke. It only takes
into account level 1 and 2 of the complexity hierarchy that I
described above.

-- snip start --
B. You seem to imply that Bean Validation does not export its
metadata. We do export metadata and that has been a big goal from day
one. Have a look at the EDR1 and at this post
http://in.relation.to/Bloggers/Constrai ... tionJSR303
for more informations
-- snip end --

Thank you for that link. I did not know about it before, but I read
the JSR 303 before :) You are completely right that JSR 303 takes meta
data into account and that is a big step forward!! Nevertheless the
meta data is present only (if I understand that correctly) on a per
class level, e.g. in the example given in the link the
ElementDescriptor ed = addressValidator.getConstraintsForProperty("zipcode");
would always return the same ed object for any address in the system.

What I would propose is to have the meta data available on a per
object (instance) level in a data structure that parallels the object
hierarchy (FieldMetaInfo objects).

Let me give you an example. Imagine you have a legal contract that can
be signed by two persons, the main contract holder and a "second"
contract holder (e.g. the spouse or one of the parents of the main
contract holder). The person instances have addresses attached to
them. It is vital to verify the main contract holder's address,
perhaps by using an external service that costs money, in order to be
sure to avoid fraud. The business departments may decide that for the
second contract holder this is not required in order to save money.

As far as I understand in the current JSR 303 it would be possible to
implement such a behaviour if you would implement a class level
validation rule on the contract level. There you have enough
"contextual" information to decide if it is a main contract holder
address or a second contract holder address. The navigation paths from
the contract level down there may be long,
e.g. contract.person1.address and contract.person2.address. The
navigation paths for other validation rules may get even longer. In
addition it feels unnatural to implement an address validation on the
contract level, only because only there I have enough contextual knowledge.

I would propose to create a second parallel data structure on a per
instance basis that stores meta data. This meta data can then even be
updated in the course of validating one page on a user interface after
the other.

Let me give another example: suppose that if a customer does not
decide to take an insurance against some risk then a second contract
holder is obligatory. Otherwise a second contract holder might be
optional. Only in the course of filling in fields on consecutive pages
at some point, where you enter the insurance options you will know
what the validation rules for the second contract holder will be. You
can implement that if you have a per instance data structure.

-- snip start --
C. You seem to imply that Rules engines are to take over the world
eventually. I am fairly sympathetic to the idea which, BTW, is shared
by my colleague and friend Mark Proctor. That being said, rules do
contradict your willingness to expose metadata. And the hard part in
integrating a rule engine is to find a nice way to fill up the working
memory (including avoiding to load too much data but enough data
nevertheless).
-- snip end --

No not really :) In my team we originally had an OO polymorphism based
validation implementation based on the castor validator engine. At
some point the situation become so complex that it consumed a lot of
time to analyse existing rules. The solution was to change our
implementation approach. The system is still complex (that's just a
reflection of the complexity given to us by the business departments)
but now it is manageable.

You are right that you cannot extract meta data from rules! But what
you can do is in the "then" part of a rule to update meta data in the
FieldMetaInfo parallel data structure that can be used by other
components. That's the approach we take.

One more point. I see in the JSR that there is a
StandardConstraintDescriptor class via which some parameters can be
queried and there are the ElementDescriptor and the
ConstraintDescriptor classes via which meta data can be queried. As
far as I see in the examples the parameters to the constraints are
always given during compile time in annotations (or I did not look
carefully enough) and they are equal for all instances of a given
class.

We've implemented our validation around the idea that if rules fire on
one field then they can set/update constraints (or
required/optional/invisible state) on other fields. Let me give an
example: If you enter a contract and ask first for the personal data
of a person wanting to sign the contract and ask him for his birth
date then a rule could react to a change of that field by updating the
"enumeration" constraint on the insurance options field. Then if a
user is too old to be insured against the risk of unemployment he will
not be offered that option a few pages later when he arrives at the
insurance details page.

I imagine the definition of constraints similar to the definition of
services in the /etc/services file on unix systems. There should be a
standards body that defines standard types of constraints so that
interoperability between libraries and users of libraries is
easier. In addition specialized constraint types can be defined on the
fly. Standard constraint types could be:
- enumeration
- String
- length
- pattern
- Number
- min
- max
- range
- ...
In principle the restrictions that xml-schema allows to make.

In my current prototype constraints are "static", e.g. they have a
mnemonic (an identifier) plus parameters. In principle a constraint is
a list with the first element telling you which constraint type it is
and all the other parameters are parameters that describe the
constraint. The position in the parameter list gives you the meaning
of the parameter. An example for an enumeration constraint of german
speaking countries could be:
{enumeration, "AT", "DE", "CH"}
I agree that a better approach would be a class like
ConstraintDescriptor, but a constraint descriptor should be created by
querying the per instance FieldMetaData meta data structure and rules
firing on some fields can modify these objects during the course of
navigating through your application.

-- snip start --
D. Validation rules complexity levels
1. you need to remember that a data coming from a user is first
'converted' (in native types) and then 'validated'. It seems you are
mixing the two.
-- snip end --
This is just a matter of terms. The conversion will throw an exception
if the string value coming from a user interface will not match the
data type definition. In my opinion there is no conceptual difference
to validation.

Imagine you have as input to your program a legacy text file
(e.g. an EBCDIC fixed length file from a mainframe) that you have to
parse yourself. Somewhere you have to define the rules on how to
convert from the string representation to your internal data
structure. Again you have "meta data" that describes an expected
format and you will have to generate meaningful error messages if the
data does not match your specification. In my opinion this is the same
as what happens when you validate higher levels than level
1. validation rules.

-- snip start --
3 and 4. we have bean level validations for that though I agree that 4
is much better handled by a rule engine (but then see my issues in
C).
-- snip end --
Even for level 2 it might already be useful to make a firing rule on a
given field update the constraints on a completely different field in
the "object soup" of objects to be validated. Side remark: the nice
thing about the term "business logic" is that there is no logic in
them. Even if you live in your nice object oriented world where there
exist things like classes that describe a concise set of data
suddently a business person comes along and has an idea on how to
"join" one field of one type to another field of another type by a
"validation/business" rule that destroys your nice view of the world.

An example could be: a user enters a date since when he is living in
his current address. This address has to be in the past. This would be
the validation rule for that field. But in addition you have a second
rule observing the value of this field that says if the time that he
lives at his current address is less than two years you have to ask
him for his previous address. It is possible that your external
services that you use to verify the person's address are too slowly
updated, so that for safety reasons you will ask for a previous
address and validate this address, too, if his current address cannot
be confirmed. The rule would make the fields for the previous address
"required" whereas by default they are "invisible".

-- snip start --
5. this can be solved most of the time by a combination of the groups
feature (see one of the Bean Validation blogs I did) and the fact that
a Constraint validation implementation is injected to you by a
ConstraintFactory. If the ConstraintFactory is container aware, it can
inject the necessary dependencies related to your context and assert
on them. A ConstraintFactory is pluggable but I anticipate
applications frameworks like Web Bean to implement and plug it to Bean
Validation. While the approach I describe covers most cases, a full
implementation of what you are describing is only achievable by a rule
engine (see point C).
-- snip end --
I have to look at that in more detail. I have to admit I do not fully
understand how this will work. I guess this has something to do with
section "4.4 Bootstrapping". Could you give me some pointers to
examples that describe that aspect in more detail?

Especially as the ConstraintFactory Constraint.initialize() method
only takes an annotation as argument, how would I augment a constraint
with context information? By another method that is specific to my
implementation of some constraints?

-- snip start --
=> you are telling me that a single file containing all the rules
remains manageable, come on :)
-- snip end --
Yes, this is what I try to tell you :) after some years with an OO
approach and some years with a OO/rules approach.

-- snip start --
I could not find the real meat in your code
-- snip end --
The meat is the idea to create a per instance parallel hierarchy of
meta data that can be manipulated by rules firing during the
(incremental) validation process. This parallel data structure should
have enough information (meta data) in order to facilitate the
implementation of not only validation rules but other aspects of your
application that need exactly the same meta information.

Another aspect is that my implementation can take meta data for level
1 and level 2 validation rules from different "meta data
providers". my main focus was to capture meta data from xml-schema,
but it is also possible to get meta data via hibernate validator
annotations.

The construction of that parallel hierarchy of FieldMetaInfo objects
is hidden via an aspectj aspect that can be applied to any pure POJO
domain object hierarchy. That hierarchy is build transparently
whenever an object of the domain model is created either via new or by
deserializing from a serialized form. The main application can work
without even knowing about the validation infrastructure.

The third aspect are the long range validation rules implemented via
JBossRules and the injection of context data into the rule engine
working memory to have contextual data available for validation
purposes.

Thank you very much for your reply. I will definitly go through the
specs in more detail once again and try to find more information. I
will also dig deeper in the RI.

cs224 · **Posted:** Wed Oct 29, 2008 10:07 am

Hello Emmanuel,

sorry for not being clear enough. I'll try answer to your points and
make things more clear.

-- snip start --
A. it seems in some areas you are describing how some subsystems (like
UI) should behave when facing some validation (esp your description
about properties set in the 'future' and 'past' and the need to clear
them). This is fine but not the scope of Bean Validation.
-- snip end --

Exactly this was one of my concerns, that the scope might be to narrow
:) I think that other subsystems NEED to be taken into account in
order to not force users of the JSR 303 to duplicate data already
present in the validation infrastructure across other subsystems.

I do not know about what is planned for JSF version 2, but the
validation infrastructure in JSF version 1 is a joke. It only takes
into account level 1 and 2 of the complexity hierarchy that I
described above.

-- snip start --
B. You seem to imply that Bean Validation does not export its
metadata. We do export metadata and that has been a big goal from day
one. Have a look at the EDR1 and at this post
http://in.relation.to/Bloggers/Constrai ... tionJSR303
for more informations
-- snip end --

Thank you for that link. I did not know about it before, but I read
the JSR 303 before :) You are completely right that JSR 303 takes meta
data into account and that is a big step forward!! Nevertheless the
meta data is present only (if I understand that correctly) on a per
class level, e.g. in the example given in the link the
ElementDescriptor ed = addressValidator.getConstraintsForProperty("zipcode");
would always return the same ed object for any address in the system.

What I would propose is to have the meta data available on a per
object (instance) level in a data structure that parallels the object
hierarchy (FieldMetaInfo objects).

Let me give you an example. Imagine you have a legal contract that can
be signed by two persons, the main contract holder and a "second"
contract holder (e.g. the spouse or one of the parents of the main
contract holder). The person instances have addresses attached to
them. It is vital to verify the main contract holder's address,
perhaps by using an external service that costs money, in order to be
sure to avoid fraud. The business departments may decide that for the
second contract holder this is not required in order to save money.

As far as I understand in the current JSR 303 it would be possible to
implement such a behaviour if you would implement a class level
validation rule on the contract level. There you have enough
"contextual" information to decide if it is a main contract holder
address or a second contract holder address. The navigation paths from
the contract level down there may be long,
e.g. contract.person1.address and contract.person2.address. The
navigation paths for other validation rules may get even longer. In
addition it feels unnatural to implement an address validation on the
contract level, only because only there I have enough contextual knowledge.

I would propose to create a second parallel data structure on a per
instance basis that stores meta data. This meta data can then even be
updated in the course of validating one page on a user interface after
the other.

Let me give another example: suppose that if a customer does not
decide to take an insurance against some risk then a second contract
holder is obligatory. Otherwise a second contract holder might be
optional. Only in the course of filling in fields on consecutive pages
at some point, where you enter the insurance options you will know
what the validation rules for the second contract holder will be. You
can implement that if you have a per instance data structure.

-- snip start --
C. You seem to imply that Rules engines are to take over the world
eventually. I am fairly sympathetic to the idea which, BTW, is shared
by my colleague and friend Mark Proctor. That being said, rules do
contradict your willingness to expose metadata. And the hard part in
integrating a rule engine is to find a nice way to fill up the working
memory (including avoiding to load too much data but enough data
nevertheless).
-- snip end --

No not really :) In my team we originally had an OO polymorphism based
validation implementation based on the castor validator engine. At
some point the situation become so complex that it consumed a lot of
time to analyse existing rules. The solution was to change our
implementation approach. The system is still complex (that's just a
reflection of the complexity given to us by the business departments)
but now it is manageable.

You are right that you cannot extract meta data from rules! But what
you can do is in the "then" part of a rule to update meta data in the
FieldMetaInfo parallel data structure that can be used by other
components. That's the approach we take.

One more point. I see in the JSR that there is a
StandardConstraintDescriptor class via which some parameters can be
queried and there are the ElementDescriptor and the
ConstraintDescriptor classes via which meta data can be queried. As
far as I see in the examples the parameters to the constraints are
always given during compile time in annotations (or I did not look
carefully enough) and they are equal for all instances of a given
class.

We've implemented our validation around the idea that if rules fire on
one field then they can set/update constraints (or
required/optional/invisible state) on other fields. Let me give an
example: If you enter a contract and ask first for the personal data
of a person wanting to sign the contract and ask him for his birth
date then a rule could react to a change of that field by updating the
"enumeration" constraint on the insurance options field. Then if a
user is too old to be insured against the risk of unemployment he will
not be offered that option a few pages later when he arrives at the
insurance details page.

I imagine the definition of constraints similar to the definition of
services in the /etc/services file on unix systems. There should be a
standards body that defines standard types of constraints so that
interoperability between libraries and users of libraries is
easier. In addition specialized constraint types can be defined on the
fly. Standard constraint types could be:
- enumeration
- String
- length
- pattern
- Number
- min
- max
- range
- ...
In principle the restrictions that xml-schema allows to make.

In my current prototype constraints are "static", e.g. they have a
mnemonic (an identifier) plus parameters. In principle a constraint is
a list with the first element telling you which constraint type it is
and all the other parameters are parameters that describe the
constraint. The position in the parameter list gives you the meaning
of the parameter. An example for an enumeration constraint of german
speaking countries could be:
{enumeration, "AT", "DE", "CH"}
I agree that a better approach would be a class like
ConstraintDescriptor, but a constraint descriptor should be created by
querying the per instance FieldMetaData meta data structure and rules
firing on some fields can modify these objects during the course of
navigating through your application.

-- snip start --
D. Validation rules complexity levels
1. you need to remember that a data coming from a user is first
'converted' (in native types) and then 'validated'. It seems you are
mixing the two.
-- snip end --
This is just a matter of terms. The conversion will throw an exception
if the string value coming from a user interface will not match the
data type definition. In my opinion there is no conceptual difference
to validation.

Imagine you have as input to your program a legacy text file
(e.g. an EBCDIC fixed length file from a mainframe) that you have to
parse yourself. Somewhere you have to define the rules on how to
convert from the string representation to your internal data
structure. Again you have "meta data" that describes an expected
format and you will have to generate meaningful error messages if the
data does not match your specification. In my opinion this is the same
as what happens when you validate higher levels than level
1. validation rules.

-- snip start --
3 and 4. we have bean level validations for that though I agree that 4
is much better handled by a rule engine (but then see my issues in
C).
-- snip end --
Even for level 2 it might already be useful to make a firing rule on a
given field update the constraints on a completely different field in
the "object soup" of objects to be validated. Side remark: the nice
thing about the term "business logic" is that there is no logic in
them. Even if you live in your nice object oriented world where there
exist things like classes that describe a concise set of data
suddently a business person comes along and has an idea on how to
"join" one field of one type to another field of another type by a
"validation/business" rule that destroys your nice view of the world.

An example could be: a user enters a date since when he is living in
his current address. This address has to be in the past. This would be
the validation rule for that field. But in addition you have a second
rule observing the value of this field that says if the time that he
lives at his current address is less than two years you have to ask
him for his previous address. It is possible that your external
services that you use to verify the person's address are too slowly
updated, so that for safety reasons you will ask for a previous
address and validate this address, too, if his current address cannot
be confirmed. The rule would make the fields for the previous address
"required" whereas by default they are "invisible".

-- snip start --
5. this can be solved most of the time by a combination of the groups
feature (see one of the Bean Validation blogs I did) and the fact that
a Constraint validation implementation is injected to you by a
ConstraintFactory. If the ConstraintFactory is container aware, it can
inject the necessary dependencies related to your context and assert
on them. A ConstraintFactory is pluggable but I anticipate
applications frameworks like Web Bean to implement and plug it to Bean
Validation. While the approach I describe covers most cases, a full
implementation of what you are describing is only achievable by a rule
engine (see point C).
-- snip end --
I have to look at that in more detail. I have to admit I do not fully
understand how this will work. I guess this has something to do with
section "4.4 Bootstrapping". Could you give me some pointers to
examples that describe that aspect in more detail?

Especially as the ConstraintFactory Constraint.initialize() method
only takes an annotation as argument, how would I augment a constraint
with context information? By another method that is specific to my
implementation of some constraints?

-- snip start --
=> you are telling me that a single file containing all the rules
remains manageable, come on :)
-- snip end --
Yes, this is what I try to tell you :) after some years with an OO
approach and some years with a OO/rules approach.

-- snip start --
I could not find the real meat in your code
-- snip end --
The meat is the idea to create a per instance parallel hierarchy of
meta data that can be manipulated by rules firing during the
(incremental) validation process. This parallel data structure should
have enough information (meta data) in order to facilitate the
implementation of not only validation rules but other aspects of your
application that need exactly the same meta information.

Another aspect is that my implementation can take meta data for level
1 and level 2 validation rules from different "meta data
providers". my main focus was to capture meta data from xml-schema,
but it is also possible to get meta data via hibernate validator
annotations.

The construction of that parallel hierarchy of FieldMetaInfo objects
is hidden via an aspectj aspect that can be applied to any pure POJO
domain object hierarchy. That hierarchy is build transparently
whenever an object of the domain model is created either via new or by
deserializing from a serialized form. The main application can work
without even knowing about the validation infrastructure.

The third aspect are the long range validation rules implemented via
JBossRules and the injection of context data into the rule engine
working memory to have contextual data available for validation
purposes.

Thank you very much for your reply. I will definitly go through the
specs in more detail once again and try to find more information. I
will also dig deeper in the RI.

cs224 · **Posted:** Sat Apr 23, 2011 3:46 am

There is a recent article on InfoQ that explains the Naked Objects design pattern:
http://www.infoq.com/articles/haywood-ddd-no
The book is definitely worthwhile to read!

What does this have to do with JSR 303 and my long text from above? I think the root of my criticism was that I felt that JSR 303 is too much focused on a very narrow definition of the domain model and it basically only focuses on the level of this narrowly defined domain model and the persistence.

Naked Objects takes a much broader view and includes the end-to-end application stack into its focus from the UI down to the persistence.

Naked Objects does all of this in OO style and uses naming conventions to introspect the "meta data". JSR 303 was about describing validation rules in meta data. I still believe that there is enormous potential in bringing the two concepts together and have meta data describing the end-to-end validation and introspection requirements of an application.

mikethompsonuk82 · **Joined:** Fri Dec 09, 2011 12:45 pm **Posts:** 1

The answers here have been very helpful indeed.

Quote:

There is a recent article on InfoQ that explains the Naked Objects design pattern:
http://www.infoq.com/articles/haywood-ddd-no

Thanks for the link as well. I will be checking out this book for sure. Reading while using the bathmate hydropump is my thing you see.