haproxy three parallel requests, return first one that isn't 404?

I have three servers storing files. One stores the most common ones, another the "not so common" ones, and finally another storing stuff that's mostly never accessed. Right now i have them under subdomains (like a-cdn.domain.me, b-cdn.domain.me etc), and when somebody requests something on a-cdn if it's missing it's redirected to b, and then eventually to c.

I'd like to speed this up since it gets really slow with requests to b or c, and i was thinking about doing the three "a, b, c" requests in parallel, and then return the first non-4xx ones. Has anybody done a setup like this, or similar?

I also looked for distributed filesystems, and saw seaweedfs, but apparently it doesn't let you easily move files between servers, and when files have to get requested it's still from separate servers (there isn't a master "gateway" you can request files from). This would be my go-to path to be honest, if there was a way to fuse mount everything.

Comments

  • havochavoc OGContent Writer

    My first instinct would be to write code from scratch for it.

    Out of the box - I'd look at fault tolerant load balancing and use/(abuse) that to get the desired effect

  • Any particular reason you don't store the files on server A, B and C? GlusterFS might be worth looking into to get that arranged.

  • foxonefoxone OG
    edited December 2019

    @Solaire said:
    Any particular reason you don't store the files on server A, B and C? GlusterFS might be worth looking into to get that arranged.

    Server A is very fast 1.6TB SSD, with gigabit connection
    Server B is slightly slower 5TB HDD. with gigabit connection but slightly slower
    Server C is fast 10TB cached HDD, but with crippled 200mbit connection from my home provider

    Total volume of data is around 7TB right now. Server A actively fetches stuff, and acts as a collector. Data gets then siphoned into server B, waiting for the nightly sync on server C.

  • In nginx, there is an option try_files on 404 status code.
    Checkout if you can send the request to another server. I am guessing that this is possible.

  • @bountysite said:
    In nginx, there is an option try_files on 404 status code.

    I don't believe it would be useful to me. Files are on another server, not local one

  • mikhomikho AdministratorOG

    Whatif m, instead of redirect to another sub-domain, let the a-cdn proxy get the files for you?

    Not sure how to explain it but a kind of symlink the files from the 2 backend servers to the primary.

    I made a proxy script once to let users download from my rapidshare(?) premium account by only adding the last part of the url to the file into my domain.

    The server then fetched the file and streamed it to the client.

    Or you could have some sort of index/database of the filenames and where they are hosted.

    One lookup and then send the file.

    Reading this jibberish and not sure I can understand it myself. ??‍♂️
    Anyways, hope it helps.

    “Technology is best when it brings people together.” – Matt Mullenweg

  • @foxone
    I don't think any stock solution does this, besides mammoths like Ceph, OpenStack and such. If there is, someone please surprise me.
    I guess you could (mis)use some haproxy stick-table jutsu, but it will force all the traffic through haproxy...
    I think I agree with @havoc, I would have written some learning dispatcher - broadcast request, remember who answers, respond with 307 Temporary Redirect, periodically and on failure broadcast again.
    Here is some goodies to read:
    https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-costs-of-s3-compatibility/

  • ionswitch_stanionswitch_stan OGRetired
    edited January 2020

    @bountysite said:
    In nginx, there is an option try_files on 404 status code.
    Checkout if you can send the request to another server. I am guessing that this is possible.

    Without something custom, this is likely your best bet. Try_files will proxy our to remote hosts and can follow a waterfall of hosts to return the first non-404 link. If you run with keep-alive enabled to the downstream hosts, you are basically looking at the stack-up of RTT for each remote system as your worst case time.

    Is there a reason you aren't keeping more frequently requested content on the faster server/higher in the sequential list of hosts, and only moving aged content out to the slower hosts?

    @mikho said: Or you could have some sort of index/database of the filenames and where they are hosted.

    Ultimately this is the other method. Track what files are on each server, and redirect/proxy directly.

    Ionswitch.com | High Performance VPS in Seattle and Dallas since 2018

  • @ionswitch_stan said:
    Without something custom, this is likely your best bet. Try_files will proxy our to remote hosts and can follow a waterfall of hosts to return the first non-404 link. If you run with keep-alive enabled to the downstream hosts, you are basically looking at the stack-up of RTT for each remote system as your worst case time.

    Jebus..

    My pronouns are like/subscribe.

  • @WSS O(n).

    Ionswitch.com | High Performance VPS in Seattle and Dallas since 2018

  • havochavoc OGContent Writer
    edited January 2020

    Nginx isn't gonna work cause it learns.

    Ie it temporarily blacklists servers tgat 404. So if you request files in quick succession it might blacklist server a cause file isn't there and then second file that is on server a can't be access cause still blacklisted

  • @ionswitch_stan said:

    Is there a reason you aren't keeping more frequently requested content on the faster server/higher in the sequential list of hosts, and only moving aged content out to the slower hosts?

    I was planning to use haproxy to proxy the frequently requested content, so that it would be done automatically. So nginx backend requesting from server a, then b, then c until there is a match, and haproxy caching all that with a huge pool (40GB ssd or so)

  • @ionswitch_stan said:
    @WSS O(n).

    Not really, even with parallel requests. Now if they were all on local switching fabric and the lookup wasn't a huge sweep of inodes..

    This is just a fucking messy design.

    My pronouns are like/subscribe.

  • NeoonNeoon OGSenpai
    edited January 2020

    In theory, HAProxy could ask the 3 webservers in the config, and which replies with a 200 and has the file, should hand it out then.
    You would need to disable all caching in HAProxy, but you could easy put a normal nginx infront for caching files or a CDN e.g bunnycdn.

  • @Neoon said:
    In theory, HAProxy could ask the 3 webservers in the config, and which replies with a 200 and has the file, should hand it out then.
    You would need to disable all caching in HAProxy, but you could easy put a normal nginx infront for caching files or a CDN e.g bunnycdn.

    ..at which point, grabbing a shitty low-RAM storage box with fast disk would probably actually be nicer with Squid, instead of HAProxy direct, unless you're expecting these files to change often? non-caching HAProxy seems like a painful proxy option when it'd be trivial to do minor local caching if it's busy.

    My pronouns are like/subscribe.

  • IMO where haproxy excels is for proxying anything quickly and easily. Everything else is probably better handled with nginx, or a combination of both.

    Get the best deal on your next VPS or Shared/Reseller hosting from RacknerdTracker.com - The original aff garden.

  • anon630anon630 Retired
    edited March 2020

    @foxone said:

    I don't believe it would be useful to me. Files are on another server, not local one

    You can use a backend to pass the request to another server.
    try_files ...... @backend_srv ;

    I am not too sure, but I think this should work.

Sign In or Register to comment.