Inside Paulo Abrantes' head
[ start | index | login or register ]
start > 2007-02-28 > 1

Software Developing: Domain Driven Design, looking at Persistence

Created by pabrantes. Last edited by pabrantes, 3 years and 167 days ago. Viewed 2,712 times. #12
[diff] [history] [edit] [rdf]
labels
attachments
DataMapper.png (16477)
IdentityMap.png (18211)
LazyLoad.png (15751)

Software Developing: Domain Driven Design, looking at Persistence

First I wrote >>Software Developing: Domain Driven Design where >>jff asked a few general questions about DDD - since he isn't in the area - as a result I not only replied to his questions but also wrote a second post called >>Software Developing: More About Domain Driven Design in order to make some concepts more clear. On the second time it was jpmsi that raised interesting questions regarding the subject, pointing out information tarpits for people that were new to DDD.
In this post I'll try to expand one of the subjects that jpmsi pointed out: Persistence in the DDD.

Note: I won't be entering (yet) in the database structure, explaining how to do the Object-Relation bridge but that can be expect in a future post. Also, in the end of each item I'll be presenting some code, which, if javascript is enabled in the browser it will be collapsed, "example code" will be linkable and when clicked will show the code.

To anyone who as read >>Martin Fowler's >>Patterns of Enterprise Application Architecture this post will seem very familiar. Since I'm talking about the book let me say it's a must read to anyone who is interested in software architectures.

But let's start with the basics, and the easiest thing - yet still very important - which is defining what means persistence.
Persistence is nothing more than the ability of preserving data beyond the execution of the program that has created it. Persistence is associated with some kind of database system, from a >>RDBMS to file system storage.

When creating the architecture there are at least three things, regarding persistence, that should be in mind:

  • How to access the data layer:
    • Mainly how >>CRUD operations are performed
  • What's the domain level behaviour regarding persistence
    • How to persist every change done in a object that is in memory
    • How to deal with objects relations
    • Only load same object into memory once
  • Domain Objects must have some sort of relation with the data store in RDBMS
    • Store in an object field the primary key for that object in the database
Right now let's put the third point aside and assume that each domain object has an extra attribute which is the primary key for the object and it's auto-magically set. In the end I'll get there.
I'm sure there are various ways of solving such problems, but I'll be presenting - like I already said - >>design patterns that are based in Martin Fowler's previously mentioned book.
I'll present two different kinds of patterns, using Martin Fowler's names I'll be talking about the the so called Data Source Architectural Patterns and the Object-Relational Behavioural Pattern.

Data Source Architectural Patterns

These patterns are responsible for objects in the Data Layer that take care of the direct access to the data storage, abstracting the actual storage system to the rest of the application. Abstraction in this kind of situations is good, because with such abstraction the storage system can be replaced by another one and no other layer besides the Data Layer have to be re-written.

Pattern 1: Table Data Gateway

In this pattern exists an object that acts as a gateway to the RDBMS table - or tables depending of the database structure. That object has as it's interface the CRUD operations and any given domain object that needs to be persisted will be using this gateway.
Like the name implies the gateway is a gateway for a table, so there is one gateway object for each one of the tables that exist.

As an example, let's imagine that we have a Book object which has a title and a number of pages - along with the primary key that has been previously mentioned.

Example code
public class Book { private static BookTableDataGateway gateway =new BookTableDataGateway(); private Integer id; private String title; private Integer numberOfPages;

public Book(String title, Integer numberOfPages) { this.title = title; this.numberOfPages = numberOfPages; this.id = gateway.saveBook(title,numberOfPages); }

public String getTitle() { return this.title; }

public void setTitle(String title) { this.title = title; gateway.updateBook(this.id,this.title,this.numberOfPages); }

public static Book readBook(Integer id) { return gateway.readBook(id); }

// more code here }

public class BookTableDataGateway {

public Book createBook(String title, int numberOfPages) { // jdbc code here }

public Book readBook(Integer id) { // jdbc code here }

public void updateBook(Integer id, String title, int numberOfPages) { // jdbc code here }

public void deleteBook(Integer id) { // jdbc code here } }

Besides the problem of the maintenance overhead - since there's a new object for each table in the database - there is also the problem of how should the relations with other objects be solved:

  • using proxies?
  • also getting the other objects in memory?
Each way as it's pros and cons so it's up to the person who implements to decide.

Side note: a simple way to test the domain layer when using this pattern is to replace gateways by >>mock objects.

Pattern 2: Active Record

The Active Record is a very well known pattern, specially after the hype around >>RoR, since it extensively uses Active Records.
In the Active Record pattern each domain object knows by itself how to deal with his persistence. This means that the data access logic is within the domain objects, in my opinion it makes the domain object dirty. This can bring problems along the development specially if the domain in question isn't trivial. But on the other hand if it's a simple domain a few conventions are used - like in RoR - then it can be a powerful, yet not very flexible, aide to the developer.

Example code
public class Book { private Integer id; private String title; private Integer numberOfPages;

public Book(String title, Integer numberOfPages) { this.title = title; this.numberOfPages = numberOfPages; this.id = save(); }

private void update() { // jdbc code to update the object } private Integer save() { //jdbc code to save the object }

public void delete() {

}

public static Book readBookById(Integer Id) {

} }

Pattern 3: Data Mapper

DataMapperIn the two previous patterns the domain objects knew about the need to be persisted. Domain Objects would have in their code methods to access the data layer itself (Active record) or at delegate to another object which they knew the interface (table data gateway). In those two patterns there wasn't a real abstraction between the domain layer and the persistence layer.
This pattern is different, it introduces a true separation between both layers, the domain objects do not know anything about the persistence. It brings a bit more complexity in the mapping system but in applications where developers want their domain to be truly independent from persistence a Data Mapper might be the solution.

This pattern can be used along with the Lazy Loading Pattern - which will be described later on this post. Mainly what could happen is that when the load of a certain object is needed the load method of the mapper is called returning the object in question.

Object-Relational Behavioural Pattern

The behavioural patterns have the objective of aiding the operations that are happening in the domain layer. By aiding I mean keep tracking of what's changing in the objects, bringing objects from and putting objects in the database.

Each pattern presented here will solve a different problem, in complex systems all three can be found implemented.

Pattern 1: Unit of Work

This pattern introduces a way of keeping track of all in-memory modifications and make sure they'll get reproduced in the database. The idea is that inside a transaction flag all objects that has been modified or created an in the end of the transaction - in the commit - find all flagged objects and write their modifications into the database.

Example code
public class UnitOfWork {

List<DomainObject> newObjects; List<DomainObject> dirtyObjects; List<DomainObject> deletedObjects;

public UnitOfWork() { this.newObjects = new ArrayList<DomainObject>(); this.dirtyObjects = new ArrayList<DomainObject>(); this.deletedObjects = new ArrayList<DomainObject>(); }

public void markNewObject(DomainObject object) { newObjects.add(object); }

public void markDirty(DomainObject object) { dirtyObjects.add(object); }

public void markDeleted(DomainObject object) { deletedObjects.add(object); }

public void commit() { boolean rollBack = false; TransactionalSystem.openTransaction(); try { for(DomainObject object : newObjects) { // code to create new object } // cycles for the other two lists }catch(PersistenceException exception) { rollBack = true; exception.printStackTrace(); }

if(rollback) { rollBack(); } else { TransactionalSystem.commit(); } }

public void rollBack() { TransactionalSystem.fail(); newObjects.removeAll(); dirtyObjects.removeAll(); deletedObjects.removeAll(); } }

Pattern 2: Identity Map

This pattern has two main objectives, avoid data incoherence and speed up the materialization of objects.
The Identity Map is nothing more than a cache, which only materializes objects from the database if they aren't already in memory, it's behaviour can be seen in the activity diagram present below.

IdentityMap

This pattern avoids data incoherences because when a object is in memory it only exists one instance of it, so even if somehow it's asked to be read from database it will be returned the same instance and at any time there will be two different instances of the same object making it able for the same object to have different internal states, which would be a severe data incoherence.

Example code
public class IdentityMap {

private Map<Integer,DomainObject> objectsCache; private DatabaseConnector connector;

public IdentityMap() { objectsCache = new HashMap<Integer,DomainObject); connector = new DatabaseConnector(); }

public DomainObject getObjectWithId(Integer id) { DomainObject object = objectsCache.get(id); return (object!=null) object : connector.materializeObjectWithId(id); } }

I don't know if it's obvious but what has been called Database Connector in these example can - and should be - one of the patterns that have already been described under the Data Source Patterns.

Pattern 3: Lazy Load

LazyLoadLazy loading is the ability to load needed attributes only when requested. In Domain Driven Design this is extremely useful since domain objects are strongly related with each other. Mostly because if every attribute of a domain object would be resolved and materialized when a single object would be loaded the entire domain model could be loaded into memory. In a small model this behaviour might not be a problem but, in a big one it is.

The idea is to use something that represents the object but it's not the object itself. The something mentioned is called a proxy, the lazy load pattern is often used with >>Proxy pattern, more specific with the virtual proxy. I've said often because there are other ways of achieving lazy load besides proxying, such as value holders.

As an example imagine that a Book as a relation with Authors, one book may have many Authors. Both Book and Author are domain objects.

Example code
public Author implements IAuthor { private String name;

// ...

public String getName() { return name; } }

public AuthorProxy implements IAuthor{

private Integer authorsId; private Author author = null;

public AuthorProxy(Integer idForRealObject) { this.authorsId = idForRealObject; }

public String getName() { if(author==null) { author = connector.materializeAuthorWithId(authorsId); } return author.getName(); } }

public class Book implements IBook { private Integer id; private String title; private Integer numberOfPages; Collection<IAuthor> authors;

public Collection<IAuthor> getAuthors() { return authors; } }

Keeping track of what is what

Like it has been said before the in-memory objects need to have some relation with the data stored in the RDBM, that relation is kept by a new field in the object which has the key for the object in the relational table.

The question to ask is, where does that identification used as primary key come from? There are two different approaches:

  1. It can be generated in the application
  2. It can be generated by the RDBMS
If the identification is generated by the application, it should be kept in mind the following:
  • The Identification numbers have to be unique, at least for each class and, depending how the domain is mapped in the relational database, it's hierarchy.
  • The generation and attribution of such identification needs to be transactional.
There are persistence frameworks that uses this strategy, some implementing more than a way different way of calculating the identification numbers, thought there is a simple algorithm, called the >> Hi/lo Algorithm that is often used due to it's efficiency. This algorithm generates the next identification based in some parameters.
A strategy is to keep those parameters in a table in database and use them to get the new object id.

On the other hand if the choice is to let the RDBMS take care of the identification numbers, using for example AUTO_INCREMENT in the keys, there are also ways for retrieving the identification number.
It's usual for database driver connectors to provide a certain interface that allows the access to the generated row from the performed query. But, even if such functionality is not available there's always the SQL way to retrieve the maximum - so the latest - identification from a given table. In the last case should be kept in mind that the creating and read of the identification number should be done in the same transaction in order to be sure that it's the correct identification that is being read.

Conclusions

It's true that already exists frameworks that deliver the behaviours described in this post and that can be - and probably should be- used instead of implementing all this from the scratch. An example of one of those frameworks is >>hibernate.
I do defend that if there are good frameworks they should be used instead of re-inventing the wheel. Though, the concepts that are underlying to such frameworks should be understood in order to gain the ability of analysis and to understand "how the show is running backstage" in case of some strange thing happens.

After this post I hope some of those concepts and ideas were made clear.

Icon-Comment jpmsi, 3 years and 174 days ago. Icon-Permalink

I have privately discussed this with my good friend Paulo, but I thought that I should make a comment here anyway.

I believe the article is very well written. It approaches topics I think are very relevant in the context of DDD (the thinks that make it work). I even learned a couple of things myself :)

Why these topics?

The first thing I should explain is why I think these are important topics. To a purist DDD developer, these subjects are pretty much non-existent. We believe the underlaying architecture takes care of that stuff to us. Rarely do we have to go down to the level of the inner workings of the frameworks. However, when I explain the merits and workings of DDD to some old school developer (read: people who only understand SQL+VB and the concept of a domain model is somewhat blurred with ER models) I get stuck to explaining them in terms they understand. And still they often end up asking "But wait… where do you code the services layer?".

So, and even though Paulo as explained pretty well how do relations between objects work (keys and such), I now raise the question on how one can invoke these methods from a transactional perspective.

With SOAs, the transactional unit is often a service, and its invocation generates a unique transactional context in which all changes occur (can you find out why simplistically speaking calling services within services is a bad idea?). How does DDD get around that (bear in mind that we strive for transparency from the perpective of business code)?

I've seen a couple of perspectives (dare I say "aspects"?), but I urge anyone to come up with their own views :P

Yet another point

Another point that I think is worth noting is object search. Another frequent question I get is how do I do a search with a set of criteria.

Typically, developers create a database view, with fields for every criteria included in the search, plus the results columns intended. From there, a DAO is created to use with this view - with more or less flexibility concerning the use of criteria: use of conjunction (And, Or, ...), operators (Like, In, Is, Between, ...). This DAO will then yield something like a datatable (.NET specifics, sorry :P) which we'll use to present the results.

How can we reach this flexibility with DDD? To this, I do not know many approaches, but I promise I'll look into the possibilities Hibernate offers.

Icon-Comment pabrantes, 3 years and 170 days ago. Icon-Permalink

With SOAs, the transactional unit is often a service,

As you know - since you've already been developing code at FenixEdu - a thin service layer can be present in DDD approaches offering transactional support. By thin layer I mean dumb services, in other words, services that are not aware of business logic - that will be taken care by the objects - and will only do gets and sets of objects.

In my opinion the introduction of such thing layer might be a mixing between SOA and DDD. But it seems one of the best approaches to deal with transactional context, still why don't you share some of the ones you've already seen?

But… now that you've mentioned aspects, that seems an interesting interesting approach! What's the main idea? Mark a setter as transactional?

Though I guess that there would be a drawback the compilation time would increase drastically (remember when we tried aspects on Fenix for Software Testing class?). Though a selective weaving might fix that problem, not sure about that though.

Another point that I think is worth noting is object search. Another frequent question I get is how do I do a search with a set of criteria.

To me that's probably one of the biggest problems you might have in DDD, or maybe I just lack of experience on such problem.

Using DAO's in DDD can be done, although that's punching DDD approach in the face. So I prefer to leave it aside.
Using OQL-like functionalities like the ones given by Hibernate - or any other persistence framework - can be an option, although I don't know if will be a really good one.

Other idea - never tried it so I'll just hope this doesn't backfire on me - was to use >>lucene as your domain object indexer. With following "rules":

  1. You wouldn't be indexing the object itself but only certain object fields.
  2. Those fields would be marked as index on your (possible) Domain Modelling Language
  3. There would have to exist a re-indexing system for creation, update and deletion of domain objects.
I already have experience with lucene, >>dspace uses it to search in repository's meta-information. Just to leave an idea of performance I can retrieve hundreds - if I'm not wrong the largest search result I got was something around 4000 hits - of relevant hits on a search in something like one second.

Other approaches to search might be:

  • Indexed reading where you still mark on your domain modelling language certain field as indexed and the DDD framework allows you to apply some sort of criteria reading. This is almost like OQL although not breaking the DDD concepts - at least I don't think it does.
  • In Memory Search, read all your search domain into memory and iterate through it finding the objects that do match your criteria. This is definitely not a good option since it does not scale at all.
Any other ideas are always welcomed.

Icon-Comment jpmsi, 3 years and 169 days ago. Icon-Permalink

As you know - since you've already been developing code at FenixEdu - a thin service layer can be present in DDD approaches offering transactional support. By thin layer I mean dumb services, in other words, services that are not aware of business logic - that will be taken care by the objects - and will only do gets and sets of objects.

Well, I was just mentioning the standard approach developers often use with SOA. In my current project, our project leader decided to use a ThreadTransactionManagar that is used in every entry-point on the service layer (I although I do not want to elaborate further, I ca still say I dislike that approach). Basically the transaction manager offers one transaction per thread, allowing us to use nested service calls :S.

But… now that you've mentioned aspects, that seems an interesting interesting approach! What's the main idea? Mark a setter as transactional?

When I mentioned Aspects, I was referring to the process of adapting an already framework to transparently use the transactional system. I have had some time with NHibernate now, so I can safely say it is not a good think when programming DDD. First, there is the need to create a Session context for each service (transactional unit), and that does not merge well with our DDD approach. After that, for each modification we have to call object.save() to persist changes. And to create/delete we have to call static methods on the Domain class. This is not a very good DDD way of doing things. My naïve Aspects comment was about calling save after each setter (which in the end does not offer a full solution)… Comments are more than welcome...

I did thought about some other approaches, namely only persisting changes on the persistence layer upon commiting, and walking the graph of objects to determine creations/updates/deletes. However this approach requires the use of an Software Transaction Memory manager, which is exactly what you guys use @ fénix.

Other idea - never tried it so I'll just hope this doesn't backfire on me - was to use >>lucene as your domain object indexer. With following "rules":

Regarding searches, I will add only this: you are still throwing away performance when you want to search across multiple tables (read joins).

Should anyone come up with a good solution, please do tell me… Although I pretty much doubt such would exist...

Edit: I found some good reading about the subject

And I would recommend reading some other posts that are off-topic for this discussion.

Icon-Comment pabrantes, 3 years and 168 days ago. Icon-Permalink

[...] Basically the transaction manager offers one transaction per thread, allowing us to use nested service calls :S.

Nested transactions sometimes can be good, although it's something that should be used carefully.

My naïve Aspects comment was about calling save after each setter (which in the end does not offer a full solution)… Comments are more than welcome…

What about using aspects with a Unit of Work? The aspects would simply mark an object as new, dirty or delete on the unit of work. Still a Transaction Manager would be needed to then collect the data from the unit of work and persist such changes… (No coments, back to square zero).

Regarding searches, I will add only this: you are still throwing away performance when you want to search across multiple tables (read joins).

Yes, even if everything is loaded into memory and you don't need to worry about the materialization time a search in memory cannot be compared with an optimized search of a RDBMS. Specially if you need to do navigation through the domain (read joins). There aren't perfect solutions, only tradeoffs. But it would be very (very!) nice to have a good solution for the problem.

By the way, interesting links the ones you gave!

Icon-Comment jpmsi, 3 years and 168 days ago. Icon-Permalink

Nested transactions sometimes can be good, although it's something that should be used carefully.
Notice how I did not say nested transactions, but nested service calls… On transaction per thread says it all :P
(No coments, back to square zero).
Back indeed. I still think only a SMT will solve this...

Regarding searches, I've talked to Mr. Gil, and he also thinks that having a domain representation over a view, in order to materialize searches faster is a pretty good trade-off. Kind of having a Search<DomainObject> concept in your domain… What say you?

Icon-Comment pabrantes, 3 years and 168 days ago. Icon-Permalink

Notice how I did not say nested transactions, but nested service calls… On transaction per thread says it all :P

Hmmm, indeed I misunderstood it, my apologies.
I was thinking that each service call would be a thread and the transactions representing those threads would be subtransactions of the "top-level" transaction.

Regarding searches, I've talked to Mr. Gil, and he also thinks that having a domain representation over a view, in order to materialize searches faster is a pretty good trade-off. Kind of having a Search<DomainObject> concept in your domain… What say you?

So what you are saying is that the Search object would be nothing more than a view that could allow you to search objects through criterias, right? Well to me it seems a good trade-off, although I think it's dangerously approaching a DAO pattern (a generic DAO, but yet a DAO). When I was thinking about using Lucene my idea was similar to that one, but like I was expecting such idea did backfire on me. Mr. Luis Cruz (liked Mr.) pointed me out that Lucene isn't transactional aware and we would have consistency problems when a transaction would abort. And making the transaction index aware is..hmmm...spooky? Anyway I'll leave lucene idea for now.

Icon-Comment jpmsi, 3 years and 168 days ago. Icon-Permalink

...I think it's dangerously approaching a DAO pattern (a generic DAO, but yet a DAO).
Well, I do think that it should observe a DAO behaviour, since the DAOs used with persistence APIs have all the query facilities we will need to use (query languages, like OQL and the likes). Maybe it's a good idea not to mix these objects with our domain, and let them be a specialized construction to deal with a specialized problem (searches).

Icon-Comment pabrantes, 3 years and 167 days ago. Icon-Permalink

Maybe it's a good idea not to mix these objects with our domain, and let them be a specialized construction to deal with a specialized problem (searches).

Well yes, if you do pay attention and do not mix DAOs on the domain itself I think I can agree with such solution. You could abstract the DAOs and provide a search interface or api that would allow to use DAOs indirectly.

Icon-Comment m4ktub, 3 years and 167 days ago. Icon-Permalink

I think we are approaching two distinct, yet closely related, subjects:
  1. The use of DDD
  2. Persistence of an OO model
This distinction is important to clarify things as we can have the first without the second (bare with me) and the second without the first. We can also have the first with an OO DB which changes many of the assumptions made.

Now, considering searches over the graph of objects composing the OO model resulting of a DDD.

I don't think the Search<DomainObject> concept that João described above was completly understood. The suggestion was to consider a view in a database as a regular table. So, if your persistence framework maps objects to tables, it can map and object into a view. If you then ask the framework to provide all objects of that type (a readAll) it will provide objects representing the result of a complex query "transparently".

A simple example: a bank. Lets assume that the core of the bank's domain is the relation between accounts and their owners. But "Customer" is also a relevant concept for us (the bank) and we want to decide between those customers that we would rather loose and those customers we want to keep.

ddd-example-1

I've just invented a stereotype in the example to mark those domain concepts that we will map into a view. But I believe that they are domain concepts with the same relevance as the others. They just have different domain rules. For example "a TerribleCustomer can't open any more accounts" o "a PremiumCustomer has +1% in the interest rate". The fact that these domain classes can be implemented in the database as two views it's problably only an optimization.

But this isn't the main problem enounced here.

The traversal of the object graph corresponding to the application's domain is more inefficient than direct query in a database and materialization of the result set over a database.

Well, I believe that is more inefficient if you are doing a complex (several levels deep) search, most of the graph is not in memory, and you need to materialize those intermediate objects to do the traversal. If the intended query is just 1 or two levels deep then processor speed higly overcomes the network latency/traffic normally associated with the communication to a database. Offcourse I have to do some tests to back this up (all those Prevayler related tests are out of date and unrealistic).

Icon-Comment jpmsi, 3 years and 165 days ago. Icon-Permalink

I don't think the Search<DomainObject> concept that João described above was completly understood. The suggestion was to consider a view in a database as a regular table. So, if your persistence framework maps objects to tables, it can map and object into a view. If you then ask the framework to provide all objects of that type (a readAll) it will provide objects representing the result of a complex query "transparently".

Well, to me the main problem with this approach is the materialization of the objects resulting from a search, plus all the related objects needed to present the results, when such materialization was (probably) already made by the database engine.

I'll present the same problem as I did to you personally yesterday: Consider a costumer with orders. You wish to list all costumers plus it's most recent order. You would create an index on the date of the order, and join the costumer table with the order table on the most recent order. The query will discard quickly the orders we don't want because it simply transverses the BTree to find our order.

However, when done in memory, upon materializing the results, you'll have to find the most recent order by iterating manually through the orders of the costumer, materializing them all to do the comparison and find the most recent. This is a very simple example that (I hope) can demonstrate my point.

Icon-Comment pabrantes, 3 years and 165 days ago. Icon-Permalink

First of all my apologies for a late reply… Now back to our interesting discussion,

I don't think the Search<DomainObject> concept that João described above was completly understood. The suggestion was to consider a view in a database as a regular table. So, if your persistence framework maps objects to tables, it can map and object into a view. If you then ask the framework to provide all objects of that type (a readAll) it will provide objects representing the result of a complex query "transparently".
m4ktub

Yes, I hadn't understand like that, thanks for pointing it out.
The idea itself seems good and more efficient, afterall database engines do that at it's best. But you still defend that for one or two levels and having all the domain materialized the search in memory will be faster...hmm… I'm not really sure about that, but neither am I sure about the other way around! When can we see your tests result data? smiley

(...) when such materialization was (probably) already made by the database engine.
jmpsi

I'm sorry I didn't quite follow you here. The database already had made the materialization? We're talking about relational databases there's no concept of materializing an object. I must have misunderstood something about what you were saying. Could you please clarify?

However, when done in memory, upon materializing the results, you'll have to find the most recent order by iterating manually through the orders of the costumer, materializing them all to do the comparison and find the most recent. This is a very simple example that (I hope) can demonstrate my point.
jmpsi

That is not completely true - let me be a bit of a troll - imagine for example that you keep an ordered list of orders, using this implementation you could simply do a direct access to the first or last item of the list (depending on how you were ordering it).
But yes, I did understand what you were saying. Doing a readAll and iterating the resulting list in order to find which objects are matching our criteria is, may I dare to say always, a nasty implementation that won't scale properly.

Icon-Comment jpmsi, 3 years and 165 days ago. Icon-Permalink

I'm sorry I didn't quite follow you here. The database already had made the materialization? We're talking about relational databases there's no concept of materializing an object. I must have misunderstood something about what you were saying. Could you please clarify?
When I talked about this kind of materialization I was talking about the DB engine following relations (joining) and applying restrictions. This is not to be mistaken with the materialization of objects.

That is not completely true - let me be a bit of a troll - imagine for example that you keep an ordered list of orders, using this implementation you could simply do a direct access to the first or last item of the list (depending on how you were ordering it).
Well, introduce me to the persistence framework that does that and I'll take a byte at it. Still, you are still materializing all these objects into memory (because the framework now has the burden of sorting them) to use just one. Even if the sorting is delegated upon the DB engine, you would still materialize an whole set of proxies (at best) to use only just one. Multiply that with your entire result set and there you have it...

Anyway, allow me to provide with yet another example… Say you have a class with a state machine. This class has states s1, .., sn. Now you wish to provide a way for the user to search through your 1e7 records with a criteria that maps to a set of check boxes representing each state. Selecting none searches objects with any state. Selecting some represents the union of states selected.

You can simply add an index on the state, and using the IN operand you get some pretty quick results (the engine is quick to discard unselected states...). Would doing that in memory be anywhere near that efficient?.

And yet another question: say you wish to do paging on the results… Using a parametrized query on the view can quickly discard unwanted results… Doing that in memory can be disgraceful, even only materializing a list of, say, 1000 proxies to use only 20.

My point being, the sum of inefficiencies DDD presents in this field can very well be the deciding factor not to use it in a performance critical environment…

Icon-Comment m4ktub, 3 years and 165 days ago. Icon-Permalink

Anyway, allow me to provide with yet another example… Say you have a class with a state machine. This class has states s1, .., sn. Now you wish to provide a way for the user to search through your 1e7 records with a criteria that maps to a set of check boxes representing each state. Selecting none searches objects with any state. Selecting some represents the union of states selected. - jpmsi

And if a tell you that we keep an hashtable/structure with the mapping state->{elements in that state}. And that we identify that structure as and important concept in our domain. And for efficiency reasons that structure is mapped into a view like the examples above. In many cases we can transform a problem to match a pattern more familiar to DDD.

Nevertheless the paging issue is terrible. I currently don't know a solution that does not require all objects to be in memory. Paging really requires a deep integration with the presistency engine.

When can we see your tests result data? smiley - pabrantes

It's coming up slow. But it's coming up more like a curiosity and a dismissal (or confirmation) of some dogmas than a definitive answer to most of the questions we've made here. Sometime soon …

Icon-Comment pabrantes, 3 years and 163 days ago. Icon-Permalink

Nevertheless the paging issue is terrible. I currently don't know a solution that does not require all objects to be in memory. Paging really requires a deep integration with the presistency engine.
m4ktub

Well if you don't mind having some nasty hacks under the hood your persistence engine could materialize an object, let's call it Pager, that would do the job. The code would be something like (keeping in mind that some verifications should be done and are not present in the example):

public class Pager<E> { Integer pageSize; List<Integer> idsToMaterialize;

public Pager(Integer pageSize, List<Integer> idsToMaterialize) { this.pageSize = pageSize; this.idsToMaterialize = idsToMaterialize; }

public List<E> getPage(Integer page) { List<Integer> pageIds = idsToMaterialize.subList(this.pageSize*page,this.pageSize*page+pageSize); List<E> elements = new ArrayList<E>(); for(Integer id : pageIds) { elements.add(PersistenceEngine.materialize(id)); } return elements; }

public int getTotalElements() { return idsToMaterialize.size(); } }

This solution is based on how dspace perform searches and how I page them. As I previously mentioned dspace uses Lucene and lucene returns an object that among other things contains the total number of elements matching our search and a list of a certain identifier that allows us to know which object is he talking about.

There's one thing bugging me though, I would like that materialize method would also receive the class of the object. But I can't say E.class, although I could say:

E x = new E(); Class realClass = x.getClass();

But I think it's...err...too dirty! Any better ideas about this?
With this system you could request the persistent system to return a certain list as a Pager instead of a List. But no order on the list, which in many cases might be a problem.

Icon-Comment jpmsi, 3 years and 161 days ago. Icon-Permalink

And if a tell you that we keep an hashtable/structure with the mapping state->{elements in that state}. And that we identify that structure as and important concept in our domain. And for efficiency reasons that structure is mapped into a view like the examples above. In many cases we can transform a problem to match a pattern more familiar to DDD.
But this a different approach to my stance in DDD. I believe that in terms of the Domain of the application, this concept is not relevant at all.. It stands as a solution to a problem that arises from our choices of the programming architecture/methodologies. This part should be of no concern to the domain logic, and I currently think that it should be presented as an infrastructural service to the application.

Edit: I think it stands more clearly if I'd ask you to explain to a client the need of a business representation of a search concept. In terms of his business, it is meaningless, therefor it does not belong to the domain.

Icon-Comment jpmsi, 3 years and 162 days ago. Icon-Permalink

Well if you don't mind having some nasty hacks under the hood your persistence engine could materialize an object, let's call it Pager, that would do the job. The code would be something like (keeping in mind that some verifications should be done and are not present in the example):
I reply the same as I did to Mr. Gil. I really think this approach is valid, but not a part of DDD. It is a solution to a specific problem that does not concern business logic.

And how about (new E()).getClass(); :D:D:D

Icon-Comment m4ktub, 3 years and 161 days ago. Icon-Permalink

I agree with you. Most of these problems are infrastructural. There should be direct support for this situations, like paging, in the persistence framework we are using. All other solutions are hacks around it and often are not relevant concepts in the application's domain.

just thinking out loud , pay not attention: Off course the real challenge is to provide a paging solution that is well integrated in the model. Ideally we would be able to do a query over the domain, with some sort of OQL, and have paging in an efficient way without ever leaving the comfort of the OO model. Something in the lines of

QueryPager<Item> pager = Query.pagedExecute(Item.class, "orders[date > %1].customer.country == 'PT'", lastWeek);

print(pager.getNumberOfPages(pageSize)); for (Page page : pager.getPages(pageSize)) { for (Item item : page.getElements()) { for (Order order : (Collection<Order>) page.getElements(item)) { Customer customer = (Customer) page.getElement(item, order);

assert(page.getElement(item, order, customer) == "PT"); } } }

I have to check some existing OQL solutions and how they try to optimize the query. (more one thing to do smiley )

And how about (new E()).getClass()

E, in the example, is not an example class, it's a generic type. So you can't even do new E().

Icon-Comment pabrantes, 3 years and 160 days ago. Icon-Permalink

I agree with you. Most of these problems are infrastructural.
m4tub

Yes, I also do agree with that.

There should be direct support for this situations, like paging, in the persistence framework we are using.
m4ktub

Does this really makes sense? Shouldn't be the persistence framework a small set of operations and then you would create more complex operations - such as paging - around it?

E, in the example, is not an example class, it's a generic type. So you can't even do new E().
m4ktub

Yes, you're totally right, it seems my brain froze while writting that part. After all generics are compile time, not run time. If on runtime there's no generics you cannot make a reference to it's class. Oh well… It seemed a good idea to read objects using the class.

Icon-Comment jpmsi, 3 years and 159 days ago. Icon-Permalink

Something in the lines of
m4ktub

Woa… I think you may be getting near to the level of .Net's LINQ with that OQL query (OQL is not and should not be new to me, but I just could no resist throwing that one). Read >>.Net's LINQ for some examples (thanks Mr. <censored> for that).

From what I read, you created a collection of pages (as in page 1, records 1-10, page 2, records 11-20 and so on). Then you materialize the elements by iterating the page elements? Am I understanding correctly?

If so, wouldn't it be better to have a

Query.pagedExecute(Class c, String query, int begin, int offset, Object… params);

Does this really makes sense? Shouldn't be the persistence framework a small set of operations and then you would create more complex operations - such as paging - around it?
pabrantes

My € 0,02 is that paging is something you will be wanting to do at database level…

Icon-Comment m4ktub, 3 years and 160 days ago. Icon-Permalink

Does this really makes sense? Shouldn't be the persistence framework a small set of operations and then you would create more complex operations - such as paging - around it? pabrantes

As we saw, the problem with that is the lack of control over the specific features of the DB that allow to improve performance.

From the LINQ documentation: The Standard Query Operators is an API that enables querying of any .NET array or collection.

I have only read a couple of lines but it looks like it's an in-memory query infrastructure that offers a language that is resembles SQL. So that does not solve our problem of a "transparently persisted OO model capable of offering efficient query with paging operations". I give a further look at

Regarding the previous example, jpmsi is right, those parameters in the pagedExecute method make sense but my goal was other. I was focusing on the problem stated before where, after the query, you would need to do extra computation in memory and redo part of the filtering already made in the DB. So the example shows a Page structure that would give you for each page a list of Item, for each Item the list of orders that matched the query for that item, for each order, the corresponding customer, and for each customer it's country. This information would be obtained with one query, possibly directly from the DB, and stored efficiently (uh, the magic word, but I was only thinking on lazy loading :-D) in the Page structure thus preventing us to reiterate over the Item's orders.

Icon-Comment m4ktub, 3 years and 154 days ago. Icon-Permalink

Ok, finally I could obtain some of the results I've been promissing in this discussion. Note that these results are obtained without any formal method and the only way you can be sure is by doing the tests by youselves smiley.

ddd-bench-a

As you can see, I've measured run-time of 3 strategies so higher is worst. I will let the conclusions for each one but I just want to focus the relation between A and C and what I will probably try to measure next.

A can be considered the worst case scenario of C (oh, C is the actual time measured, C/100 being the time of a single iteration). In fact I've also measured the time of the first iteration in C and it was bery similar to the times of A. So we have to consider how many misses from memory take C/100 to a value higher than B. I've tryed to measure the time of loading a single object from the db but it was late in the night and the result had a lot of noise.

Offcourse that if one deployment of the application used the C strategy it would required much more memory and would need to mantain the consistency of all objects in memory over all the machines in a cluster. That would increase the time of C and I'm not counting that here.

A deployment like this would probably do fine with a single DB for a long time. The load would be surely in the frontend servers. In B I believe that the load would be in the DB so scalability could be obtained through clustering of the DB.

Icon-Comment pabrantes, 3 years and 153 days ago. Icon-Permalink

It's unbelievable the time that C/100 took, smoking fast!

I think we can say that this test kinda dismistifies the idea that searches on the DB are faster. Although in most of the applications will never have all the domain in memory.
I guess one of the things you'll be trying to measure - you didn't mentioned what you were gonna try next - will be where between A and C situations memory searches are still faster than a view in the database (B).

Also I think it would be interesting to know the time it takes to load an object, maybe one day that you won't be doing such tests late in the night? ;)

I think we all should thank m4ktub for the valuable information he has posted! Thanks Claúdio :)

Icon-Comment m4ktub, 3 years and 148 days ago. Icon-Permalink

Don't get overexcited because I've finished the tests I wanted to do. The extra tests I've made include paging and the computation of a value with some cuts like João suggested. I've also redone B test because I've introduced a few more optimizations in the DB level (some extra indexes).

ddd-bench-b

As you can see the test from B was reduced to less than 1/3 of the original value.

The D test was ommited. In it I tryed to measure the time it takes to load a single record. It proved meaningless. It doesn't take long (around 0.30) and the "real" cases hardly ever loaded a single record. We normally loaded an entire 1-n relation.

The E test was a query like "SELECT SUM(...) … WHERE status = <value> and date BETWEEN <begin> and <end>". The in memory computation required us to process each Order and filter the order by status and date and then compute the sum for all OrderItems in the orders that weren't filtered. As you can see, DB clearly beated the in-memory processing. João was righ when considering that database cut's based on indexes are very efficient compared to in-memory graph traveling. As an additional note I've found that, in F, most of the time was spent comparing dates (creating a calendar, seting the date and using compareTo()).

The last group of tests paging. In-memory paging, that is H, includes creating a new List from the list of persisted objects, sorting that list and geting the subpage (subList(start, start+offset)).

So, in-memory computation can beat the db in some situations. We lack the kind of infrastructure the database has when regarding query optimization, indexes, cuts, etc. That infrastructure could be partially implemented in a framework to optimize the selection of objects in memory and increase the cases were in-memory beasts communicating with the database. Nevertheless there is a huge effort in the DB backend and I'm not sure how much of that effort would be needed to replicate in such framework. We also have the scalability isssue ...

smiley Guess this didn't gave us THE answer to our problems but at least we have some "pretty" graphics. Uh! Shinnny!!!.

Icon-Comment pabrantes, 3 years and 147 days ago. Icon-Permalink

Don't get overexcited
m4tub

Damn...ok!

As you can see the test from B was reduced to less than 1/3 of the original value.
m4ktub

First conclusion of the night: database optimizations do work. :-P

The E test was a query like (...) As an additional note I've found that, in F, most of the time was spent comparing dates (creating a calendar, seting the date and using compareTo()).
m4ktub

Not that I'm being stuborn or any sort of evangelist but, such results might be due to the lack of optimization in the in-memory search code. There can be better search strategies rather then going from the all list and using compareTo(). Still, on that test the query kicked asses.

The last group of tests paging. In-memory paging, that is H, includes creating a new List from the list of persisted objects, sorting that list and geting the subpage (subList(start, start+offset)).
m4ktub

Too bad you have to do a copy...otherwise it would be smoking fast. You're making the copy because you're ordering the list, right?

So, in-memory computation can beat the db in some situations. We lack the kind of infrastructure the database has when regarding query optimization, indexes, cuts, etc. That infrastructure could be partially implemented in a framework to optimize the selection of objects in memory
m4ktub

So mainly what you are suggesting, is to bring a DB engine into our application? I'm not very sure about that, database engines are fine tuned and mature piece of software, a new framework would probably need a huge effort to achieve similar ideas. I say probably because I have no expertises in the area and it's just my common sense. Hence, I think we should leave database related subjects within the database.

smiley Guess this didn't gave us THE answer to our problems but at least we have some "pretty" graphics. Uh! Shinnny!!!.
m4ktub

Well there's no "THE" answer, but indeed we have some "Uh! Shinnny!!!" layout breaking - if I may add - graphics. Now for real, they may not give the ultimate answers, but still they give us a good insight.

Now let's wait for João's feedback saying "AH! I told you" laugh

Please login to www.pabrantes.net.
Who am I?
paulo-roca2My name is Paulo Abrantes AKA pabrantes and I'm a software developer. I'm currently employed at >>CIIST working as a Java developer in >>FenixEDU.

This blog is mostly about Java programming, domain driven design and snipsnap bliki developing. Everything written in this blog is my personal opinion and it may not reflect the opinions of my employer and co-workers.


Blog subscription
subscribe by rss subscribe by email

Links
>> Home
>> Paulo's Profile
>> Post History
>> Add to Technorati Favorites
>> Paulo's Photo Gallery
>> WishList
>> Posting without Login

Search Blog
Fellow Bloggers

Recent Posts

Blog: Almost an year since last post
Java Programming: Bytecode Injection
Intermission: Sorry For Downtime
Software Developing: Studying The Bliki Domain Model
SnipSnap Developing: Trying to settle a roadmap
System Administration: Load Balancing with Apache
Blogging: Two years have passed
Software Developing: The SnipSnap Saga
Java Programming: Getting your code spicy with Groovy
Software Developing: Fluent Interfaces
Software Developing: Implementing a ShoutBox on SnipsSnip
Software Developing: SnipSnap, SnipIt and SnipSnip
Java Programming: Proxies and Access Control
Java Programming: Proxies and References
Java Programming: References' Package

For older posts, please refer to post-history for a complete Post History

Logged in Users: (0)
… and 3 Guests.
This is a modified version of snipsnap.org created by >>Paulo Abrantes