BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

No-Locking Fiber Switching Mechanism

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi OO design
View previous topic :: View next topic  
Author Message
Eric Hill
Guest





PostPosted: Thu Dec 16, 2004 9:30 pm    Post subject: No-Locking Fiber Switching Mechanism Reply with quote



I'm using fibers with two thread pools to accomplish some work.

1) Thread A executes Fiber 1
2) Fiber 1 sends an overlapped command to the OS
3) Fiber 1 yields

4) Thread B receives notification from the OS that the command has completed
5) Thread B resumes Fiber 1
6) Fiber 1 completes, terminates, and yields

My problem is that the OS could start Step 4 before Step 3 completes. Does this necessitate the
need for a lock (spinlock/critical section/etc) or is there a better way to accomplish this without
locking?

Eric


Back to top
Robert M.
Guest





PostPosted: Thu Dec 16, 2004 10:37 pm    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote



Eric Hill wrote:
Quote:
I'm using fibers with two thread pools to accomplish some work.
Off topic - why do you use fibers in your system? Generally speaking, did you

find any advantages over threads in multithreaded applications. Fibers put a lot
of problems for solving in your hands like scheduling and some "synchronization"
issues in the thread which owns them. May I suggest using plain threads?

Quote:

1) Thread A executes Fiber 1
2) Fiber 1 sends an overlapped command to the OS
3) Fiber 1 yields

4) Thread B receives notification from the OS that the command has completed
5) Thread B resumes Fiber 1
6) Fiber 1 completes, terminates, and yields

My problem is that the OS could start Step 4 before Step 3 completes. Does this necessitate the
need for a lock (spinlock/critical section/etc) or is there a better way to accomplish this without
locking?


Suggestion: search for producers/consumers algorithm in Julian Bucknall's book
"Tomes of Delphi: Algorithms and Data Structures" if you have one copy around.
Regards,
Robert

Back to top
Eric Hill
Guest





PostPosted: Thu Dec 16, 2004 10:55 pm    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote



Thanks for replying.

Quote:
Off topic - why do you use fibers in your system? Generally speaking, did you
find any advantages over threads in multithreaded applications. Fibers put a lot
of problems for solving in your hands like scheduling and some "synchronization"
issues in the thread which owns them. May I suggest using plain threads?

Based on some preliminary tests, fibers are faster and use less memory for maintaining a persistent
context of a staged process without using a state collection of objects. Basically I'm deferring
the state machine directly to the stack of the OS. I've actually gone a little further though and
may wind up just using my own state machine because of some other issues... I'd still like to
figure this out though :)

Quote:
Suggestion: search for producers/consumers algorithm in Julian Bucknall's book
"Tomes of Delphi: Algorithms and Data Structures" if you have one copy around.

The problem with this approach as I see it is that you cannot switch to the same fiber from two
different threads. You have to ensure that the fiber is not running before coming back to it. You
can't set a flag after switching off the fiber because you're not running any more <g>. The only
way I can come up with to make this work is to set a Tls variable to the quitting fiber's handle,
switch back to the main thread fiber, and let it read the value and notify the next of kin.

Eric



Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 2:52 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
I'm using fibers with two thread pools to accomplish some work.

Two thread pools? I could never afford such luxury... :)

Quote:
1) Thread A executes Fiber 1
2) Fiber 1 sends an overlapped command to the OS
3) Fiber 1 yields

4) Thread B receives notification from the OS that the command has
completed
5) Thread B resumes Fiber 1
6) Fiber 1 completes, terminates, and yields

My problem is that the OS could start Step 4 before Step 3 completes.

So, this is a variant of the classic ordering problem that faces
overlapped/IOCP server developers, yes? A handler thread has got hold of a
data object that is bound to a socket context, (ie fiber), that already has
another data object, (and the fiber), being processed, by another thread,
yes?

IMHO, you fiber execution should be serialised by the same mechanism that
you use for serialising data object serilaisation. I use sequence numbers
for this. If a handler receives an out-of-order data object, it cannot
process it & has to queue it in a private queue in the socket context. When
handler threads finish processing in-order data, they have to check this
queue & process any data objects in it.

Would this mechanism not force your fibers, as well as your data, to be
processed in serial fashion across all the handler threads?

Rgds,
Martin



Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 3:00 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote


To address the other half of your query... yes, some sort of locking will
probably be required somewhere. To protect my out-of-order queue, I use a
CS. Ths is very expensive, since there can be 64k socket contexts & so 64k
CS. I will, at some stage, investigate cacheing these CS, since only a very
small number are in use at any one time.

I will not use a spinlock. Even if guaranteed a multiprocessor system, I am
not happy with spinlocks.

Rgds,
Martin


Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 3:42 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:

The problem with this approach as I see it is that you cannot switch to
the same fiber from two
different threads. You have to ensure that the fiber is not running
before coming back to it. You
can't set a flag after switching off the fiber because you're not running
any more <g>.


Counl not the the thread that dispatched the fiber do it? When the fiber
yields to the thread that called it, the thread could unlock the fiber.

Rgds,
Martin



Back to top
Eric Hill
Guest





PostPosted: Fri Dec 17, 2004 4:14 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
Two thread pools? I could never afford such luxury... Smile

<g> I'm actually contemplating 5 different thread pools and a PID loop type
mechanism for each to automatically balance the load. The end-user will set
parameters (tune the PID loop for those PLC developers out there) for the
stages and let the system rebalance live.

Quote:
So, this is a variant of the classic ordering problem that faces
overlapped/IOCP server developers, yes? A handler thread has got hold of
a
data object that is bound to a socket context, (ie fiber), that already
has
another data object, (and the fiber), being processed, by another thread,
yes?

Exactly correct. This is a new server-side channel for RemObjects call
dispatching. I'm hoping to get a beta ready by the first of the year.

Quote:
IMHO, you fiber execution should be serialised by the same mechanism that
you use for serialising data object serilaisation. I use sequence numbers
for this. If a handler receives an out-of-order data object, it cannot
process it & has to queue it in a private queue in the socket context.
When
handler threads finish processing in-order data, they have to check this
queue & process any data objects in it.

Would this mechanism not force your fibers, as well as your data, to be
processed in serial fashion across all the handler threads?

I'm going to do this (yes, not try, do <g>) without any spinlocks or
critical sections. I think I've got a way to get the calls through the
thread pools without any synchronization at all. A fiber can do it's work
then yield back to the pool thread, which forwards it on to the next queue.
No fiber will be executing in multiple queues at any given time, so I won't
need any synchronizations at all.

Thanks Martin!
Eric



Back to top
Eric Hill
Guest





PostPosted: Fri Dec 17, 2004 4:16 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
To address the other half of your query... yes, some sort of locking will
probably be required somewhere. To protect my out-of-order queue, I use a
CS. Ths is very expensive, since there can be 64k socket contexts & so
64k
CS. I will, at some stage, investigate cacheing these CS, since only a
very
small number are in use at any one time.

I will not use a spinlock. Even if guaranteed a multiprocessor system, I
am
not happy with spinlocks.

I think I can get this working without any type of locking. Each stage
should isolate the fibers from one-another. My biggest concern is actually
the stack size of the fibers. A default 1MB stack will only allow for 2038
fibers. I'd like to get that number over 5,000...

Eric



Back to top
Eric Hill
Guest





PostPosted: Fri Dec 17, 2004 4:24 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
Counl not the the thread that dispatched the fiber do it? When the fiber
yields to the thread that called it, the thread could unlock the fiber.

That's what I was thinking, except instead of unlocking the fiber, the
dispatching fiber will just forward the context on to the next queue. This
should work well under heavy loads as it puts the burdon on the OS to manage
the queue.

I came up with a mechanism to adjust thread numbers dynamically without
locking, so rebalancing the queues can be done live while the system is
running. Secondly, I'm trying to separate as many aspects of the request
processing that I can into discrete stages such that each stage can be
balanced based on load. Blocking will only occur while the queue has
reached a high-water mark, and will not unblock until the queue reaches a
low-water mark. Hopefully this should let the system scale unbelievably on
multi-CPU machines.

Eric



Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 6:32 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
Exactly correct. This is a new server-side channel for RemObjects call
dispatching. I'm hoping to get a beta ready by the first of the year.

Heh! You will need a lot of testing :)

Quote:
IMHO, you fiber execution should be serialised by the same mechanism
that
you use for serialising data object serilaisation. I use sequence
numbers
for this. If a handler receives an out-of-order data object, it cannot
process it & has to queue it in a private queue in the socket context.
When
handler threads finish processing in-order data, they have to check this
queue & process any data objects in it.

Would this mechanism not force your fibers, as well as your data, to be
processed in serial fashion across all the handler threads?

I'm going to do this (yes, not try, do <g>) without any spinlocks or
critical sections. I think I've got a way to get the calls through the
thread pools without any synchronization at all. A fiber can do it's work
then yield back to the pool thread, which forwards it on to the next
queue.
No fiber will be executing in multiple queues at any given time, so I
won't
need any synchronizations at all.

I don't quite get your implementation. I do not understand why you have
more than one queue, or more than one thread pool, or how this avoids
out-of-order data processing or, (same problem, really), multiple threads
trying to schedule the same fiber.

I have only one IOCP input queue to my handler threads. All the handler
threads wait on this queue. Data objects are pushed on from both the
listener threads and from threads that have fired IO completion routines.
Multiple data objects can be bound to the same per-socket data & so
out-of-order processing is certainly posssible, and does happen. In a very
heavily loaded server with tens of thousands of clients, the handler threads
detect out-of-order processing on ~ 1 object in five & so have to queue such
objects in the per-socket data to await the completion of processing of an
earlier packet by another thread.

One annoying aspect is that, no matter how many handler threads I configure
in, how heavy the loading is, or how stroppy the protocol-handler is, I have
never logged more than one out-of-order packet per socket, making an actual
queue structure seemingly pointless - a simple store would do. I just know,
though, that as soon as I take the queue out, multiple out-of-order packets
will turn up :)

Rgds,
Martin





Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 7:10 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote


Quote:
Counl not the the thread that dispatched the fiber do it? When the
fiber
yields to the thread that called it, the thread could unlock the fiber.

That's what I was thinking, except instead of unlocking the fiber, the
dispatching fiber will just forward the context on to the next queue.
This
should work well under heavy loads as it puts the burdon on the OS to
manage
the queue.

Requeueing the context, or data object that is bound to a context, because
of an out-of-order problem does not work. I've tried it. You end up with
objects circulating for ever because the cure invites more of the disease.

Quote:
I came up with a mechanism to adjust thread numbers dynamically without
locking, so rebalancing the queues can be done live while the system is
running.

Hmm.. now I'm getting a bit lost with your design. What queues might need
to be rebalanced? My servers have only one input queue for all objects that
need action from the handler thread pool.

I did attempt to fix the 'out-of-order' problem by dumping IOCP and using a
queue class that had private and public queues. When a client connection
was accepted, a 'handlerQueue' reference in the per-socket data was set to
the single, puiblic queue, so that the first data rx, would go to any
handler thread that was free. When a handler thread processed the first
data packet, it would change the 'handlerQueue' to its private queue
reference & so all subsequent data packets would always be handled by the
same thread.

This solution had to be dumped, however, because a protocol-handler that
blocked, eg by calling sleep(), would then block all the clients bound to
that particular thread, even though there were other threads available.
With most protocols, it became obvious that the overall throughput was less
than using one input queue, sequence-numbers and an out-of-order queue in
the socket data.

Now that there are fibers, your post has reminded me of this earlier design.
Using fibers, the protocol handlers could not call the OS sleep() anyway,
and 'sleep' would be implemented by a state-machine in the thread that
called the fiber. This means that the public/private queue could be
revived, and would solve both the out-of-order data problem and also remove
any possibility of fibers being reentered from another thread.


Quote:
Secondly, I'm trying to separate as many aspects of the request
processing that I can into discrete stages such that each stage can be
balanced based on load. Blocking will only occur while the queue has
reached a high-water mark, and will not unblock until the queue reaches a
low-water mark.

I do not use such fancy twiddles Smile If I need to queue objects, I usually
use unbounded queues and an object pool. If the pool runs out, that's it -
the caller has to wait for an object to be returned. I have no high/low
water mark. If a thread is blocked on an empty queue or pool, it becomes
ready as soon as a single object is returned.

I appreciate that the high/low water mark is an attempt to reduce
context-switching under hiigh-load, but I would be worried that, in a
complex system with many threads waiting on many queues, states may exist
where no thread can continue because none of the threads has reached the
low-water-mark on whatever they are waiting on. You have probably analysed
this & decided it cannot happen...

Quote:
Hopefully this should let the system scale unbelievably on
multi-CPU machines.

Hmm...

Rgds,
Martin












Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 8:04 am    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:

I think I can get this working without any type of locking. Each stage
should isolate the fibers from one-another. My biggest concern is
actually
the stack size of the fibers. A default 1MB stack will only allow for
2038
fibers. I'd like to get that number over 5,000...

A 1Mb fiber stack!! In my server, I was thinking more like 4k.

Rgds,
Martin




Back to top
Eric Hill
Guest





PostPosted: Fri Dec 17, 2004 2:13 pm    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
A 1Mb fiber stack!! In my server, I was thinking more like 4k.

After getting some sleep, I'm really looking at two problems. First, the fibers themselves can have
a *very* tony stack. It's the dispatcher that passes requests through to the RO subsystem that
needs the larger stack to accomidate for user-code. That's easy though. I just won't execute the
dispatch in a context fiber. I'll do it in a thread from the pool. :)

Eric



Back to top
Eric Hill
Guest





PostPosted: Fri Dec 17, 2004 2:16 pm    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote

Quote:
Heh! You will need a lot of testing Smile

Will roll it out to ~100 machines in-house. Landesk can change the settings on the clients in about
10 minutes :)

Quote:
I don't quite get your implementation. I do not understand why you have
more than one queue, or more than one thread pool, or how this avoids
out-of-order data processing or, (same problem, really), multiple threads
trying to schedule the same fiber.

Google for SEDA and you should get a glimpse of what I'm thinking.

Quote:
I have only one IOCP input queue to my handler threads. All the handler
threads wait on this queue. Data objects are pushed on from both the
listener threads and from threads that have fired IO completion routines.
Multiple data objects can be bound to the same per-socket data & so
out-of-order processing is certainly posssible, and does happen. In a very
heavily loaded server with tens of thousands of clients, the handler threads
detect out-of-order processing on ~ 1 object in five & so have to queue such
objects in the per-socket data to await the completion of processing of an
earlier packet by another thread.

One annoying aspect is that, no matter how many handler threads I configure
in, how heavy the loading is, or how stroppy the protocol-handler is, I have
never logged more than one out-of-order packet per socket, making an actual
queue structure seemingly pointless - a simple store would do. I just know,
though, that as soon as I take the queue out, multiple out-of-order packets
will turn up Smile

Where does out-of-order processing come into play? Does the sockets subsystem not do things in the
order you gave it?

Eric



Back to top
Martin James
Guest





PostPosted: Fri Dec 17, 2004 3:21 pm    Post subject: Re: No-Locking Fiber Switching Mechanism Reply with quote


Quote:

Google for SEDA and you should get a glimpse of what I'm thinking.

Just what we need - yet another server architecture :)

I don't queue anything off from the handler threads until the protocol had
been parsed/analysed. when a complete protocol-unit has been assembled, the
protocol handler has the choice of handling all the processing itself, (eg.
by sending a reply or sending a requst for a 'transmitFile' to the
dispatcher thread), or queueing itself, the data object, any protocl unit
object, whatever to some other thread/s, eg. a pool of database connection
threads.

Now I see where you're going with the multiple thread pools. I don't
myself envisage any app with more than two pools. I would always want to
keep the handler threads in the server in a seperate pool, but I do not yet
see any need for more than one extra pool, eg. for database access.

As for load balancing between different subsystems, I cap the ITC by using
buffer pools for all objects in the server. The sockets, buffers of various
sizes, data objects, protocol handlers etc. are all limited in number. If a
pool runs out, the requestor is blocked on the pool until an object is
returned. At the moment, I have no clever auto-management of the pool
capacities, though the user can pop up a form to see the state of the pools
and adjust them manually.

Quote:

Where does out-of-order processing come into play? Does the sockets
subsystem not do things in the
order you gave it?

As currently configured, my dispatcher thread issues two WSARecv calls for
every connected client, each with its own data buffer object. Whenever an
IO completion routine is fired, the data objects get queued to the IOCP
queue.& another WSARecv is issued, (so that there is always at least one
outstanding. As an aside, if a recv data object is ompletely filled, the
next WSARecv is issued with a buffer twice the size). At this point, the
data is in the correct order in the data objects, and the data objects are
in the corrrect order in the IOCP queue. Then the handler threads get at
them & this happens:

Handler thread 1 gets data object 1 for socket A & calls the protocol
handler with it, (with or without running a fiber - it does not matter). In
the protocol handler, half-way thorugh parsing object 1, thread 1 is
preempted and handler thread 2 runs. Thread 2 gets data object 2 from
socket A and starts processing it though the protocol handler even though
the previous data paket has not been completely processed. This screws up
the protocol because the protocol handler for a socket can be interrupted
and reentered from another thread, with another data object, before it has
finished twiddling with it's last data object. This has to be prevented.

Rgds,
Martin

PS I don't know what you are using for your queues, but I found out just
yesterday that IOCP queues are very slow when used for plain inter-thread
comms. A P-C queue based on a pool of TEvents was 10 times faster. The
IOCP queues obviously have advantages in managing the handler threads, with
their ability to limit the number of context switches, but, elsewere, I
certainly will not be using them for ITC.











Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi OO design All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.