Hello,
I've been working on large wizard style business applications where validation accounts for roughly 30% of the business logic. I am following the topics around Hibernate Validators and JSR 303 Bean Validation for some time. I think the topic of validation should be addressed in a broader way.
I understand the advantages of reducing the scope of JSR 303 in order to get out something useful faster and collect feed-back in order to integrate that information into a version 2. The problem with this approach and validation is that if the validation framework does not address nearly 100% of all the issues you have then it does not make sense to use it. It does not make sense to distribute validation rules across different frameworks.
The core problem is that the data needed to configure validation rules is vital for several other functionality areas of user interface applications across different layers. Most likely the term "validation" itself is too restricted.
Below I'll try to describe the problems that a validation framework should be able to address and I've created a prototype of a validation system that should act as a proof of concept on how things could be handled:
svn co
https://jclusterjobs.svn.sourceforge.ne ... ionmanager
The readme file should be able to get you started. I have to apologize that there is not more documentation. i will have to take on another work assignment soon and do not have the time to polish the prototype and its documentation further. Nevertheless I would like to prevent a standard validation framework to be released that will be too restricted to be useful. Only think about the amount of time it took to make out of EJB2 something useful in EJB3.
the overall topic of validation is a core topic in most of todays
applications. in addition validation has to occur at several levels
and most of the time is duplicated across layers, which violates the
DRY (don't repeat yourself) principle. in a web application the user
interface input has to be validated in an incremental way to give the
end-user a quick (fail fast) feed-back about his mistakes.
at that level it is sad that most of the time a user first has to run
into a validation issue to trigger a validation rule to produce some
message on the screen that tells him about his mistake. it would be
preferable to have the meta data that describes the validation rule
available upfront and make the web-application display a information
besides an input field that says for example: "the login name has to be
between 5 and 12 characters long". why let the user first hit an
obstacle before guiding him? why not telling him upfront? the
validation mechanisms that are around today do not allow to query the
meta data used in validation rules and force an implementor to
duplicate the information about such rules on the user interface in
text messages, which breaks the DRY principle.
at the next level, imagine that the web application is only a tiny
front-end on top of a web service. because the web service cannot
trust the data coming from the user interface application (possibly
developed by a different company) the validation has to be done once
again on the entry to the web service. imagine that the web service
requires a data input field that is an enumeration, e.g. the countries
in which people are allowed to live in order to be able to buy goods
at a certain web-site. in the validation rule for that field the meta
info about which countries are allowed is present. but how does the
web-application get to know about the allowed values. the web
application has to display a "select" input field to prevent wrong
inputs up-front? again a case where it would be useful to not only
focus on validation but also make the meta data needed for validation
available for other reasons.
finally at the bottom of the stack is the database, which also needs
to enforce consistency rules. again rules already present in the
web-application and in the business layer need to be repeated.
validation rules also come in different complexity levels. from
simplest to most complex are:
1) the syntax validation rules needed to ensure that the input is
parseable into its basic data type (int, double, string, date, ...)
2) constraints/restrictions on the data type itself, e.g. a numerical
input field can only take values between 0 and 999. no other data
item is needed to verify the validity of that rule. this level can be
compared to the xml schema xsd:restriction element.
3) intra bean validation rules only need to be aware of data inside of
a bean, e.g. if you have a bid object where you have a valid from
and valid until date field. then the valid from should be a date
before the valid until field value.
4) inter bean validation rules are not clearly distinguishable from
intra bean validation rules, because you could always construct a
higher level "container" bean that has the beans that need to
participate in the validation attached to it as properties. but in
general the "navigation paths" for such rules are deep. an example
could be that you have a person object that contains a birth date
and you have an insurance object that contains a "type of
insurance". you could imagine a validation rule that says that a
person over a certain age cannot be insured against the risk of
unemployment any longer.
5) the final level is where you need to validate the user input in all
the levels from above against "context". the above levels only deal
with data coming as input from the user. context is data existing in
the back-ground. some examples would be:
- user profile data: belongs to the logged in user but is not
entered together with the data that should be validated. it was
entered some time ago when the user created his profile.
- client data: imagine you have a web-shop on amazon or yahoo (or
whereever) then you would be the client and set some rules,
e.g. which payment options you want to allow for your users that
shop on your site.
- constraints set by the operating company: e.g. if in the above
example yahoo would not allow your web-shop to use some payment
options or to allow it only to sell goods to customers from
certain countries.
- global constraints per platform: normally these are some legal
constraints, e.g. that you cannot sell drugs on a platform.
most of the time the context data comes from a configuration database
and is dynamic. therefore simplistic static rules like @NotNull or
@CreditCardNumber are not sufficient for most real world
applications. in addition as soon as several fields are involved a
certain order of asking users on a user interface for the data items
have to be kept. if a prerequisite of a field is in error the
dependent fields should be in error, too. data values entered for
one field may influence the validation rules for another field.
another topic is that in some cases some fields are optional but in
other circumstances depending on selections of other ("previous")
fields the values become required or not allowed at all (would be
invisible on the user interface).
the dependencies mentioned above are also relevant if a user moves
forward through a wizard like application (lets call pages that the
user already has visited as "past" and pages that are still to come as
"future"; the page he is currently working on is called "current") and
fills one field after the other, but at a certain point he decides
that he would like to change a value on one of the previous (past)
pages. such a change on one of the fields on a past page may mean that
the fields that were already entered and are now in the future may
become illegal. you can also not rely on the user coming across that
field again if he navigates forward again, because changed fields in
the past may alter the page flow and perhaps the field that became
illegal will never be traversed by the user again, but the data value
was already put into the domain model. the only reliable solution is
to clear fields that depend on a field in the past if that field in
the past changes (you could simply clear all fields in the "future",
but that is inconvenient for a user if he navigates forward to enter
all the fields again he entered already before). in order to do that
you need to have the meta data available that is present already in
the validation infrastructure.
the distinction between "past", "current" and "future" and "required",
"optional" and "invisible" becomes also relevant if you need to
implement incremental validation. imagine you have a web service that
contains all your business logic and you have third party partners
that build web-applications in order to feed your web services. you do
not want your partners to reimplement the validation rules, because
this would cause strong coupling and if you changed your back-end
logic you would need to communicate with all your partners to upgrade
their application and worse you would need to synchronize to a common
agreed date when all the applications should be upgraded. the only
"good" solution here is to implement validation in the back-end alone
and keep the web-application client dumb stupid. nevertheless the web
application needs support from your web services. first of all it
needs to get meta data (as mentioned above for
required/optional/invisible, allowable values for select fields,
information about restrictions to tell the user upfront what is
allowed or not, ...) from the back-end. in addition a user on the
front-end would expect validation to occur after every page submit to
have fail-fast behaviour. therefore the web service back-end cannot
just take the whole bunch of data (which is not available on page 1)
and validate, but it has to cope with incremental validation, to only
apply validation rules to data items that are in the "past". you then
add one field to your input data structure for the web service that
says "complete". if that field is set to "true" all fields are set to
the past and all fields are validated. any fields that are still
missing at that point or are in error are real problems that need to
be reported back to the user.
another topic that may occur to you is that you have several date
fields in your data model that need to be dates in the past and
therefore you have one rule and one error message mnemonic that
translates to "The date field must contain a date in the past". Then
your business department comes along and says that for exactly that single
field on that page that message must be changed to something like
"Birthdates are only allowed to be in the past". Even worse, imagine
you have one data model but you allow your application to be
customized by other business partners. business partner one wants the
error message: "Birthdates are only allowed to be in the past" and
business partner two wants the message: "Birthdates must not be in the
future". you can solve the different business partner problem by
introducing different message resource bundles and the different
messages based on context via the spring libraries
org.springframework.validation.Errors
reject methods that create not only one error mnemonic per error
condition, but several. have a look at:
http://static.springframework.org/sprin ... ation.html
"5.3. Resolving codes to error messages"
in short the key points that need to be addressed in future validation
frameworks besides the basic simple static validation rules are:
- meta data useful in other contexts
- display info about valid values before user runs into validation
problem
- fill web-application select key/values
- clear fields on back navigation
- take care of meta data like
- required/optional/invisible
- dependencies between fields
- incremental validation
- cross bean validation (long range coupling)
- different dynamic context data needed for validation rules
- field error messages configurable for different partners and per
context
- a solution to the above problems should remain manageable and
performant even if you have hundreds of rules and hundreds of
objects.
the core idea on how to solve these issues is to introduce a parallel
data structure of FieldMetaInfo objects that parallel beans and
properties of beans in the domain model. this data structure can do
the book keeping and serve as storage area for the validation meta
data in order to allow other parts of the application to query it. in
addition the long range validation rules should be implemented via a
rule engine. in this prototype jboss rules a.k.a. drools 4.0.7 was
used. in order to transparently add the parallel data structure to the
domain model an aspectj aspect was created. via that approach the
core business application is shielded from the details of
validation. the idea on how to use the meta data in other contexts was
explored via a seam web application "ui_web_seamgen". have a look at
the page "userManagement.xhtml" and the "layout/vedit.xhtml"
template. there is a new jsf tag "jces:validateAll" similar to the seam
"s:validateAll" that makes the validation mechanism available to seam
applications. even ajax4jsf works.
in the current prototype xml schema serves the role to provide the
meta data needed for level 1) and 2) from the above enumeration of
validation rule levels of complexity. in addition (this is not
implemented yet) the xsd:annotation element could be used to configure
additional information like field dependencies or to initialize the
required/optional/invisible property at the start of the
application. at the moment this information is read separately from a
file called "additionalFieldMetaInfo.xml".
xml schema is not the only way to describe the meta data for level 1)
and 2) validations. there is one test case that explores the usage of
hibernate validator annotations. it is called HibernateValidationTest.
in my opinion it is vital to keep any meta data for validation rules
in levels higher than 2 out of the source code! these rules normally
will change depending on business partner or sales channel. therefore
an approach to add that data via annotations is not viable. it also
makes maintainability of rules complicated if the rules are burried
somewhere deep in the code. it is best to have the rules in a single
file that can be given to business departments to verify that the
rules are correct.
believe me that a rule engine is superior over a polymorphism / object
oriented approach. if you really have a lot of complex rules you will
not be able to keep an OO approach under control. on the other hand
the rules in a rule engine are even readable by non technical people
and you have the advantage that all those rules are located at one
place. the performance is also good even for hundreds of rules and
large sets of data!