We didn't test JAX-WS RI yet, but its reputation seems to put it on the top of the list.
See
http://wiki.apache.org/ws/StackComparison for useful informations and comparison of the existing WS-Stacks.
About your problem, I fear there is no solution. We alreaded discussed this point a long time. You can call it (though I'm sure other people already used this expression) "Disk-Memory mismatch" to get it.
In fact, where you have, say, a A that has a list of Bs, with each B having a C. Then there's no proper way to directly send an A if you just want this one with no relations loaded. The lazy principle will certainly never be generalized to WSDL and such other things, and maybe it's better, (I didn't decide for me yet :)).
So the only way is to define DTOs. So, the thing is : exposing WS with @WebService since recent works has become a real pleasure. BUT, it didn't and maybe never will provide patterns for dealing with object graphs... SO you end up having to deal with say, one DTO
graph for each method if necessary.
I didn't have time to investigate this, but I think there could be solutions where those DTOs wouldn't have to be manually written : generating those ones before passing to the WS-Stack for serializing could do the trick.
This way, you would get:
* on the client side : A proper and usable graph when deserialized
* on the server/business code side : No overhead due to the numerous DTOs to develop. I think you would just have to find a way to coordinate loaded graph (by getting the setFetchMode() called or something like that) and the dynamic dto creation.
My 2 cents :-)